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Abstract 


The  next  generation  of  network-centric  applications  will  utilize  a  large  number  of  comput¬ 
ing  and  storage  systems  that  are  connected  by  global  high  speed  networks.  We  refer  to  the 
environment  that  provides  transparent  computing  and  communication  services  for  large 
scale  parallel  and  distributed  applications  as  Metacomputing  environment.  In  this  project, 
we  present  the  design  and  the  experimental  results  with  the  Virtual  Distributed  Comput¬ 
ing  Environment  (VDCE)  and  the  Adaptive  Distributed  Virtual  Computing  Environment 
(ADViCE)  being  developed  at  The  University  of  Arizona  and  Syracuse  University. 

The  VDCE  provides  an  efficient  web-based  approach  for  developing,  running,  evaluat¬ 
ing  and  visualizing  large-scale  parallel  and  distributed  applications  that  utilize  computing 
resources  connected  by  local  and/or  wide  area  networks.  The  VDCE  task  libraries  re¬ 
lieve  end-users  of  tedious  task  implementations  and  also  support  reusability.  The  VDCE 
software  architecture  is  described  in  terms  of  three  modules:  a)  the  Application  Editor,  a 
user-friendly  application  development  environment  that  generates  the  Application  Flow 
Graph  (AFG)  of  an  application;  b)  the  Application  Scheduler,  which  provides  an  efficient 
task-to-resource  mapping  of  AFG;  and  c)  the  VDCE  Runtime  System,  which  is  responsible 
for  running  and  managing  application  execution  and  for  monitoring  the  VDCE  resources. 
We  present  experimental  results  of  an  application  execution  on  the  VDCE  prototype  for 
evaluating  the  performance  of  different  machine  and  network  configurations.  We  also 
show  how  VDCE  can  be  used  as  a  problem-solving  environment  on  which  large-scale, 
network-centric  applications  can  be  developed  by  a  novice  programmer  rather  than  by  an 
expert  in  low-level  details  of  parallel  programming  languages. 

The  ADViCE  which  is  an  extension  of  the  VDCE  aims  at  supporting  mobile  com¬ 
puting  and  communication  resources.  ADViCE  supports  a  transparent  access  to  the 
development,  computing  and  communication  services  that  are  offered  regardless  whether 
the  users  are  connected  through  fixed  or  mobile  networks.  In  addition,  the  ADViCE 
resources  can  also  be  connected  through  mobile  as  well  as  fixed  networks.  The  AD¬ 
ViCE  architecture  consists  of  two  independent  servers:  Visualization  and  Editing  Server 
(VES)  and  Control  and  Management  Server  (CMS).  These  two  servers  provide  all  the 
services  required  in  an  efficient  parallel  and  distributed  programming  environment.  The 
ADViCE  services  include  Application  Editing  Service,  Application  Visualization  Service, 
Application  Resource  Service,  Application  Management  Service,  Application  Control  Ser¬ 
vice  and  Application  Data  Service.  We  also  present  the  experimental  results  to  evaluate 
the  performance  and  effectiveness  of  the  ADViCE  prototype  to  provide  three  important 
functions:  1)  Evaluation  Tool:  to  analyze  the  the  performance  of  parallel  applications 
with  different  machine  and  network  configurations;  2)  Problem-Solving  Environment: 
to  assist  in  the  development  of  large  scale  parallel  and  distributed  applications,  and  3) 
Application-Transparent  Adaptivity:  to  allow  parallel  and  distributed  applications  to  run 
in  a  transparent  manner  when  their  clients  and  resources  are  fixed  or  mobile. 


1  Introduction 

Grand  challenge  problems  have  computational  and  storage  resource  requirements  that 
are  beyond  the  capacities  of  a  single  computing  environment.  Additionally,  emerging  net- 
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work  technologies  such  as  fiber-optic  transmission  facilities  and  the  Asynchronous  Transfer 
Mode  (ATM)  enable  data  to  be  transferred  at  the  rate  of  a  gigabit  per  second  (Gbps). 
Since  high-speed  networks  have  become  more  common  and  provide  low  latency  commu¬ 
nication  services  that  are  close  to  those  offered  by  massively  parallel  processors  (MPPs), 
there  is  a  growing  interest  in  combining  the  computational  and  storage  resources  that 
are  available  over  the  wide  area  networks  to  build  a  new  execution  environment  called 
metacomputing  [21].  New  software  tools  and  techniques  are  required  to  utilize  the  meta¬ 
computing  resources,  which  are  not  fully  supported  by  existing  parallel  or  distributed 
software.  The  heterogeneous  and  dynamic  nature  of  a  metacomputing  environment  lim¬ 
its  the  use  of  existing  parallel  computing  tools;  similarly,  the  existing  distributed  systems 
may  not  provide  the  high  performance  that  is  a  key  target  in  a  metacomputing  environ¬ 
ment. 

In  this  report  we  present  the  design  of  the  Virtual  Distributed  Computing  Environ¬ 
ment  (VDCE)  currently  that  has  been  developed  at  Syracuse  University.  In  addition, 
we  also  present  the  design  and  the  experimental  results  of  the  Adaptive  Distributed  Vir¬ 
tual  Computing  Environment  (ADViCE)  which  is  an  extension  of  the  VDCE  to  support 
mobile  computing  and  communication  resources. 

VDCE  [8,  9]  provides  an  efficient  mechanism  to  execute  large-scale  applications  on 
distributed  and  diverse  platforms.  The  main  goal  of  the  VDCE  project  is  to  develop  an 
easy-to-use,  integrated  software  development  environment  that  provides  software  tools 
and  middleware  software  to  handle  all  the  issues  related  to  developing  parallel  and  dis¬ 
tributed  applications,  scheduling  tasks  onto  the  best  available  resources,  and  managing 
the  Quality  of  Service  (QoS)  requirements. 

VDCE  is  a  three-tiered  software  architecture  that  consists  of  an  Application  Edi¬ 
tor  to  assist  in  application  development  and  specification,  an  Application  Scheduler  to 
perform  transparent  application  scheduling  and  resource  configuration,  and  a  VDCE  Run¬ 
time  System  to  run  and  manage  the  application  execution.  The  Application  Editor  is  a 
web-based  graphical  user  interface  that  helps  users  to  develop  parallel  and  distributed 
applications.  In  VDCE  the  application  development  process  is  based  on  a  dataflow  pro¬ 
gramming  paradigm.  The  Application  Editor  generates  its  output  in  terms  of  an  Applica¬ 
tion  Flow  Graph  (AFG)  in  which  the  nodes  represent  task  computations  and  links  denote 
communication  and/or  synchronization  among  the  nodes  (tasks).  The  Application  Editor 
provides  menu-driven,  functional  building  blocks  of  task  libraries.  A  node  of  an  AFG  is 
a  well-defined  function  or  task  selected  firom  a  given  task  library.  VDCE  provides  a  large 
set  of  task  libraries  grouped  in  terms  of  their  functionality,  such  as  matrix  operations, 
Fourier  analysis,  C^I  (command,  control,  communication,  and  information)  applications, 
etc. 


VDCE  provides  a  distributed  runtime  scheduler,  the  Application  Scheduler,  which 
provides  efficient  task-to-resource  mapping  of  application  flow  graphs.  The  Application 
Scheduler  uses  performance  prediction  of  individual  tasks  to  achieve  efficient  resource  allo¬ 
cations.  The  schedule  decision  is  based  on  the  task  specifications  (i.e.,  hardware/software 
requirements)  in  the  application  flow  graph,  locations  and  configurations  of  resources,  and 
up-to-date  resource  loads.  The  VDCE  Runtime  System  consists  of  two  parts:  the  Control 
Virtual  Machine  (CVM),  and  the  Data  Virtual  Machine  (DVM).  The  CVM  is  responsible 


2 


for  monitoring  the  VDCE  resources,  setting  up  the  execution  environment  for  a  given 
application,  monitoring  the  execution  of  the  application  tasks  on  the  assigned  computers, 
and  maintaining  the  performance,  fault  tolerance,  and  quality  of  service  (QoS)  require¬ 
ments.  The  DVM  is  responsible  for  providing  low  latency  and  high-speed  communication 
and  synchronization  services  for  inter-task  communications. 

The  main  goal  of  the  ADViCE  project  is  to  extend  the  current  VDCE  to  support 
mobile  users  and  resources.  ADViCE  provides  a  parallel  and  distributed  programming 
environment;  it  provides  an  efficient  web-based  user  interface  that  allows  users  to  develop, 
run  and  visualize  parallel/distributed  applications  running  on  heterogeneous  computing 
resources  connected  by  wired  and  wireless  networks.  Consequently,  the  fact  that  some  of 
the  resources  are  mobile  such  as  users,  computers,  storage  devices  and  networks  become 
transparent  to  the  users  and  the  application  developers. 

The  rest  of  the  report  is  organized  as  follows.  Section  2  is  a  summary  of  the  related 
work.  In  Section  3  we  present  the  design  and  implementation  issues  of  the  VDCE  software 
architecture.  Section  4  presents  experimental  results  and  evaluation  of  the  current  VDCE 
prototype.  Section  5  presents  the  architecture  and  experimental  results  with  ADViCE. 
Section  6  presents  Concluding  remarks  and  future  work. 


2  Related  Work 

In  this  section  we  provide  a  review  of  related  work  on  the  software  development  process, 
followed  by  related  work  on  metacomputing.  The  software  development  process  of  par¬ 
allel  and  distributed  applications  can  broadly  be  described  in  terms  of  three  phases:  a) 
application  design  and  specification,  b)  application  scheduling  and  resource  configuration, 
and  c)  application  execution  and  runtime. 

In  a  well-integrated  execution  environment  it  is  important  to  provide:  a)  an  easy-to- 
use  interactive  user-interface  to  design  and  specify  parallel  distributed  applications  and, 
b)  well-developed  graphical  utilities  for  visualization  of  results  and  program  behavior. 
Generally,  writing  parallel/distributed  programs  overwhelms  users  due  to  the  difficulty 
of  explicitly  expressing  communication  and  synchronization  among  the  computations  [7]. 
A  graph-based  programming  environment,  in  which  a  program  is  defined  as  a  directed 
graph  where  nodes  denote  computations  and  links  denote  communication  and  S5mchro- 
nization  between  nodes,  may  be  used  to  decrease  the  work  of  programmers.  Currently, 
there  are  a  few  visual  parallel  programming  languages  and  environments,  such  as  Compu¬ 
tationally  Oriented  Display  Environment  (Code)  [11],  Heterogeneous  Network  Computing 
Environment  (HeNCE)  [12],  and  Zoom  [13].  To  develop  a  Code  or  HeNCE  application,  a 
programmer  first  expresses  the  sequential  computations  in  a  standard  language  and  then 
specifies  how  they  are  to  be  composed  into  a  parallel  program.  Zoom  is  a  hierarchical 
representation  abstraction  for  describing  heterogeneous  applications.  Zoom  representa¬ 
tion  of  an  application  can  be  translated  into  a  HeNCE  program  for  execution  [12].  On 
the  other  hand,  application  development  tools  and  environments  are  being  modified  to 
support  web-based  user  interfaces,  since  the  World  Wide  Web  is  becoming  a  low-cost, 
standard  interface  mechanism  with  which  to  access  the  computational  resources  that  are 
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distributed  all  over  the  world  [29]. 

After  a  parallel/distributed  application  is  developed,  the  tasks  of  the  application  are 
assigned  to  the  available  resources.  In  the  literature,  although  the  task  scheduling  (or 
resource  allocation)  problem  has  been  investigated  extensively,  most  of  the  algorithms 
and  systems  are  valid  only  for  specific  architectures  and/or  applications.  One  of  the  few 
research  groups  targeted  on  a  general  scheduling  framework  is  the  APPLeS  [14]  group. 
^ppLeS  proposes  application-level  scheduling  in  which  everything  about  the  system  is 
evaluated  in  terms  of  its  impact  on  the  application.  APPLeS  develops  a  customized  sched¬ 
ule  for  each  application  by  including  user-specific,  application-specific,  system-specific, 
and  djmamic  information  in  its  scheduling  decision.  The  Network  Weather  Service  com¬ 
ponent  provides  dynamic  information.  The  Heterogeneous  Application  Template  provides 
specific  information  about  the  structure  of  the  application.  User-supplied  information  is 
entered  into  the  system  with  a  user-specification  file.  There  are  resource  management 
systeins  to  provide  load  sharing  and  resource  allocation,  one  of  which,  developed  at  the 
University  of  Wisconsin,  is  the  Condor  [31]  project,  a  distributed  batch  system  for  sharing 
the  workload  of  compute-intensive  jobs  in  a  pool  of  UNIX  workstations  connected  by  a 
network. 

The  application  execution  and  runtime  phase  executes  the  developed  and  configured 
application  and  produces  the  required  output.  This  stage  integrates  the  assigned  re¬ 
sources  that  will  be  involved  in  execution  and  supports  inter-module  communications, 
which  are  based  on  either  a  message-passing  tool  such  as  PVM  [23],  P4  [25],  MPI  [24], 
and  NCS  [26]  or  on  a  distributed  shared  memory  (DSM)  model.  During  the  execution  of 
the  application  this  stage  accepts  data  from  different  computing  elements  and  combines 
them  for  proper  visualization.  It  intercepts  the  error  messages  generated  and  provides 
proper  interpretation.  Some  of  these  message-passing  tools  may  be  used  in  a  metacom¬ 
puting  environment,  although  they  were  initially  developed  for  parallel  and  distributed 
applications.  In  the  first  I- WAY  metacomputing  testbed,  Nexus  and  MPI  communica¬ 
tion  libraries  were  used  within  the  prototype  implementations  of  Globus  communications. 

In  addition,  there  are  a  few  projects  targeted  toward  providing  a  metacomputing  envi¬ 
ronment  on  diverse  resources.  The  earliest  metacomputer,  the  NCSA  Metacomputer  [27], 
was  an  integration  of  several  MPPs,  mass  storage  units,  visualization  and  I/O  devices. 
Globus  [21]  and  Legion  [28]  are  among  the  most  recent  projects  targeted  toward  solv¬ 
ing  metacomputing  problems.  A  low-level  toolkit  in  the  Globus  environment  provides 
mechanisms  such  as  communication,  authentication,  and  network  information.  These 
mechanisms  can  be  used  to  construct  higher-level  metacomputing  services  such  as  paral¬ 
lel  programming  tools,  schedulers,  etc.  On  the  other  hand.  Legion  is  a  distributed-object 
metacomputing  environment  that  is  targeted  to  support  a  wide  set  of  tools,  languages, 
and  programming  models.  The  major  objectives  of  the  Legion  project  are  site  autonomy, 
an  easy-to-use  seamless  computational  environment,  high  performance  via  parallelism, 
security  for  users  and  resource  owners,  management  and  exploitation  of  resource  hetero¬ 
geneity,  multiple  language  support  and  interoperability  and  fault  tolerance.  Additionally, 
there  are  several  web-baised  metacomputing  projects  [29],  that  either  use  the  JAVA  pro¬ 
gramming  language  as  the  main  computation  language  or  provide  a  coordination  medium 
based  on  WWW  technologies  or  the  JAVA  language.  There  may  be  some  drawbacks  to 
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these  methods.  First,  they  may  not  support  the  programs  written  in  other  languages  such 
as  C  and  Fortran.  Second,  they  may  support  communication  only  between  a  server  and 
a  client,  which  restricts  the  execution  of  the  candidate  applications. 


3  Overview  of  VDCE  Software  Architecture 

The  main  design  philosophy  of  VDCE  is  to  provide  a  general  software  development  en¬ 
vironment  to  build  and  execute  large-scale  applications  on  a  network  of  heterogeneous 
resources.  VDCE  is  composed  of  geographically  distributed  computation  sites  (domains), 
as  shown  in  Figure  1,  each  of  which  has  one  or  more  VDCE  Servers.  The  words  “site” 
and  “domain”  are  used  interchangeably  in  this  paper.  Each  domain  consists  of  several 
clusters,  each  of  which  includes  heterogeneous  resources  in  terms  of  type,  speed,  or  the 
configuration.  At  each  site  the  VDCE  Server  runs  the  server  software,  called  site  manager, 
which  handles  inter-site  communications  and  bridges  VDCE  modules  to  the  web-based 
site  repository.  The  site  manager  is  part  of  the  Control  Virtual  Machine  that  was  ex¬ 
plained  in  Section  3.3. 


VDCE  Silt 


Figure  1:  Virtual  Distributed  Computing  Environment  (VDCE) 

The  site  repository  consists  of  four  different  database  tables.  The  user-accounts  ta¬ 
ble  is  used  to  handle  user  authentication.  In  the  user-accounts  table,  each  VDCE  user 
account  is  represented  by  a  5-tuple:  user  name,  password,  user  ID,  priority,  and  access 
domain  type.  The  resource-performance  table  provides  the  resource  (machine  and  net¬ 
work)  performance  attributes/parameters.  These  attributes  are  grouped  into  two  parts: 
a)  static  performance  attributes  stored  in  the  database  once  during  the  initial  configu¬ 
ration  of  VDCE:  host  name,  IP  address,  architecture  type,  operating  system  type,  and 
total  memory  size,  the  computing  weight  (which  will  be  described  later  in  the  Application 
Scheduler  section)  of  each  processor  with  respect  to  a  base  processor;  and  b)  dynamic  per¬ 
formance  attributes  that  are  updated  periodically:  CPU  load,  network  latency,  network 
bandwidth,  and  available  memory  size,  number  of  processes,  etc.  The  task-performance 
table  provides  performance  characteristics  for  each  task  in  the  system  and  is  used  to 
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predict  the  performance  of  the  task  on  a  given  resource.  Each  task  implementation  is 
specified  by  some  parameters:  computation  size,  communication  size,  and  required  mem¬ 
ory  size.  For  each  task  in  VDCE,  the  task-performance  table  includes  an  entry  for  the 
measured  execution  time  of  benchmarking  the  teisk  per  machine  type  as  well  as  the  CPU 
loads  when  the  measurements  are  taken.  In  order  to  find  the  location  of  a  task’s  exe¬ 
cutable,  VDCE  stores  location  information  of  each  task  (i.e.,  the  absolute  path  of  the 
task  executable)  as  well  as  other  restrictions  that  might  be  related  to  the  task  execution 
for  each  host  in  the  task- constraints  table.  Due  to  specific  library  requirements  or  other 
license  restrictions,  some  task  executables  may  reside  only  on  a  subset  of  the  VDCE  hosts. 


S'lta  Repository 


Figure  2:  Interactions  Among  the  VDCE  Modules 

The  software  development  cycle  for  network  applications  can  be  viewed  in  terms  of 
three  phases;  application  development  and  specification  phase,  application  scheduling 
and  configuration  phase,  and  execution  and  runtime  phase.  The  functionality  of  these 
three  phases  is  handled  by  the  Application  Editor,  Application  Scheduler,  and  VDCE 
Runtime  System,  respectively.  Figure  2  shows  the  interaction  of  the  VDCE  modules 
within  a  site.  In  the  following  subsections  we  describe  in  detail  the  design  and  prototype 
implementation  issues  of  the  three  main  software  modules. 

3.1  Application  Editor 

The  Application  Editor  is  a  web-based  graphical  user  interfsice  for  developing  parallel 
and  distributed  applications.  The  end-user  establishes  a  URL  connection  to  the  VDCE 
Server  software  within  the  site  (the  Site  Manager),  which  runs  on  a  VDCE  Server  (see 
Figure  3).  The  Site  Manager  implementation  is  based  on  JAVA  Web  server  technology, 
which  uses  servlets  (i.e.,  server  site  JAVA  applets)  that  relive  the  startup  overheads  and 
run  on  any  platform.  After  user  authentication  (as  shown  in  Figure  3),  the  Application 
Editor,  which  was  implemented  in  JAVA,  will  be  loaded  into  the  user’s  local  web  browser 
so  that  the  user  can  develop  his/her  application. 

The  Application  Editor  provides  menu-driven  task  libraries  that  are  grouped  in  terms 
of  their  functionality,  such  as  the  matrix  algebra  library,  C^I  (command  and  control  appli¬ 
cations)  library,  etc.  A  selected  task  is  represented  as  a  clickable  and  draggable  graphical 
icon  in  the  active  editor  area.  Each  such  icon  includes  the  task  name  and  a  set  of  markers 
for  logical  ports.  Color  coding  used  in  this  visual  representation  helps  to  distinguish  input 
ports  from  output  ports.  Operationally,  the  Application  Editor  can  be  in  task  mode,  link 
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Figure  3:  VDCE  Authentication  Window 


mode,  or  run  mode.  In  task  mode,  the  user  can  select/add  new  tasks,  and/or  cKck/drag 
icons  to  position  them  conveniently  in  the  active  editor  area.  In  link  mode,  the  user  can 
specify  connections  between  tasks.  In  run  mode,  Editor  submits  the  graph  for  execution 
and  visualizes  the  performance  and  runtime  characteristics  of  an  ongoing  computation. 

The  process  of  building  a  VDCE  application  with  the  Application  Editor  can  be  di¬ 
vided  into  two  steps:  building  the  application  flow  graph  (AFG),  and  specifying  the  task 
properties  of  the  application.  The  application  flow  graph  is  a  directed  acyclic  graph, 
G  =  {T,L),  where  T  is  the  set  of  tasks  in  the  application  and  L  is  a  set  of  directed 
links  among  tasks.  A  directed  link  {i,j)  between  two  tasks,  T*  and  Tj,  of  the  application 
indicates  that  T*  must  complete  its  execution  before  Tj  begins  to  run.  Figure  4  shows 
the  application  flow  graph  of  a  Linear  Equation  Solver  (based  on  LU  Decomposition) 
developed  using  the  Application  Editor.  In  this  application,  the  problem  is  to  find  the 
solution  vector  a;  in  an  equation  Ax  =  b,  where  A  is  a  known  N  x  N  matrix  and  6  is  a 
known  vector.  With  LU  Decomposition,  any  matrix  can  be  decomposed  into  the  product 
of  a  lower  triangular  matrbc  L  and  upper  triangular  matrix  U.  Once  LU  Decomposition 
is  solved,  the  solution  vector,  x,  is  derived  with  x  =  To  construct  the  flow 

graph  of  this  application,  the  user  creates  nodes  by  selecting  LUJDecomposition,  Ma- 
trixJnverse(2),  and  Matrix.Multiply(2)  tasks  from  the  Matrix_Operations  menu. 

After  the  application  flow  graph  is  generated,  the  next  step  in  the  application  devel¬ 
opment  process  is  to  specify  the  properties  of  each  task.  A  double  click  on  any  task  icon 
generates  a  popup  panel  that  allows  the  user  to  specify  optional  preferences  such  as  com- 
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Figure  4:  Building  the  Linear  Equation  Solver  Application  with  the  Application  Editor 


putational  mode  (sequential  or  parallel);  domain  type  (Syracuse  University  or  Rome  Lab); 
cluster  type  (HPDC  cluster,  CAT  cluster,  TOP  cluster;  Rome  Lab  Cluster);  communi¬ 
cation  type  (P4,  socket,  MPI,  DSM,  NCS,  PVM);  thread  type  (none,  pthread,  qthread, 
cthread),  communication  protocol  type  (TCP/IP,  ATM);  machine  type  (SUN  SPARC, 
RS6000,  Pentium  PC,  HP)  and  the  number  of  processors  to  be  used  in  a  parallel  imple¬ 
mentation  of  a  given  task  (see  the  right  part  of  Figure  4).  In  this  figure,  for  the  MULT 
task  of  the  Linear  Equation  Solver  the  user  has  selected  the  parallel  execution  mode  using 
two  nodes  of  Sun  SPARC  machines  interconnected  by  an  ATM  network.  When  the  task 
properties  are  specified  the  user  may  either  submit  the  application  for  execution  in  the 
VDCE  or  store  the  application  fiow  graph  for  future  use. 

3.2  Application  Scheduler 

The  main  function  of  the  Application  Scheduler  module  in  VDCE  is  to  interpret  the  appli¬ 
cation  flow  graph  and  to  assign  the  current  best  available  resources  for  running  application 
tasks  in  order  to  minimize  the  total  execution  time  in  a  transparent  manner.  This  module 
is  based  on  application-based  scheduling  framework  [14, 15]  that  is  currently  being  imple¬ 
mented.  VDCE  provides  distributed  scheduling  in  a  wide-area  system  in  which  each  site 
consists  of  its  own  Application  Scheduler  running  on  the  VDCE  server.  The  Application 
Scheduler  has  two  scheduling  algorithms  explained  at  the  following  pages:  site  scheduler 
algorithm  and  host  selection  algorithm.  The  schedule  of  an  AFG  is  determined  by  the 
VDCE  server  at  the  local  site,  which  runs  the  site  scheduler  algorithm,  and  a  set  of  se¬ 
lected  remote  sites  that  execute  the  Host  Selection  Algorithm.  Table  1  gives  the  meanings 
of  the  symbols  used  in  the  algorithms. 

The  site  scheduler  algorithm  and  host  selection  algorithm  are  based  on  the  list  schedul¬ 
ing  [16,  17,  18]  heuristic.  In  list  scheduling  each  node  (task)  of  the  graph  is  assigned  a 
priority  and  stored  in  an  ordered  list.  In  this  paper  node  and  task  terms  are  used  inter- 
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Table  1 

Symbols  and  their  meanings 


Symbol 


Meaning 


AFG 

Site-List 


Slocal 

Srcmotc 

BW{SuSj) 

LTiSi.Sj) 

PredJrime{taski^  Sj) 
predjtime(taski^  Pj) 
EST(taski,  Sj) 
EFT{taskij  Sj) 
Predecessor  (taski) 
ExecMme(taski,  Pt&st) 
C-Load(Pj) 

M -Load{Pt&8t) 
Weight(Pj) 


Application  Flow  Graph. 

The  list  of  sites  that  will  be  part  of  the  scheduling  process. 

The  site  that  has  received  the  application  execution  request. 
The  set  of  selected  k  neighbor  sites  of  Siocai^ 

The  network  bandwidth  between  sites  Si  and  Sj- 
The  network  latency  between  sites  Si  and  Sj. 

The  best  predicted  execution  time  of  taski  at  Sj- 

The  predicted  execution  time  of  taski  on  Pj  - 

The  earliest  start  time  of  taski  at  site  Sj- 

The  earliest  finish  time  of  taski  at  site  Sj- 

The  set  of  nodes  that  are  immediate  predecessor  of  taski- 

The  measured  execution  time  of  taski  on  Ptest  for  the  trial  run. 

The  recent  CPU  load  of  Pj . 

The  CPU  load  of  Ptest  at  the  time  of  the  trial  run. 

The  computing  weight  of  Pj  with  respect  to  a  base  processor. 


changeably.  Whenever  a  processor  is  available  for  execution  the  highest  priority  task  in 
the  list  is  assigned  to  this  processor.  This  process  is  repeated  until  all  nodes  of  the  graph 
are  covered.  The  difference  among  the  list  scheduling  heuristics  is  the  way  in  which  they 
assign  priorities  to  nodes.  The  different  priority  assignment  methods  lead  to  different 
selection  orders  that  result  in  different  schedules. 

We  use  the  level  of  each  node  to  determine  its  priority  [17].  The  level  of  a  node  is 
defined  by  the  length  of  the  longest  path  from  the  node  to  a  terminal  (or  exit)  node. 
The  length  of  a  path  in  the  task  graph  is  measured  by  the  sununation  of  all  node  weights 
and  edge  weights  along  the  path.  The  node  weight  is  the  predicted  execution  time  of 
the  task,  and  edge  weight  is  the  predicted  intertask  communication  time.  Some  of  the 
previous  works  do  not  consider  the  edge  weight  when  calculating  the  level  of  a  node.  For  a 
node  weight,  we  use  the  execution  time  of  the  task  (node)  on  a  predefined  base~processor 
within  the  site.  The  weight  of  an  edge  between  task  i  and  task  j  is  measured  by  divid¬ 
ing  the  data  size  to  be  sent  from  task  i  to  task  I>(i,  j),  to  a  base  communication-link 
bandwidth,  BWbase^  We  assume  that  each  AFG  has  only  one  root  node  and  one  exit  node. 


Site  Scheduler  Algorithm 

In  this  algorithm,  the  next  step  after  initializing  the  TaskXist  with  level  values  of  AFG 
nodes  is  to  select  a  set  of  remote  sites  that  will  be  part  of  the  scheduling  process  and 
that  may  possibly  be  part  of  the  execution  process.  If  the  update-request  flag  is  true,  it 
indicates  that  one  or  more  sites  in  the  Sremote  have  high  network  traffic  (or  down).  In 
this  case,  the  remote  sites  are  selected  according  to  the  network  bandwidth  between  the 
remote  site  and  the  local  site  (shown  in  steps  4-8).  Otherwise,  the  previously  stored  set 
is  used.  Then,  AFG  and  TaskJList  are  multicast  to  the  involved  sites  for  bidding,  after 
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which  the  Host^Selection^lgorithm  is  executed  at  each  site  (step  12). 

The  Site  Scheduler  Algorithm  receives  the  bidding  from  each  site  for  each  task  in 
AFG  (step  12),  i.e.,  the  best  available  processor,  and  the  predicted  execution  time  on 
the  best  available  processor.  Step  14  assigns  the  root  task  to  the  site  that  minimizes  the 
predicted  execution  time.  Step  19  calculates  the  earliest  start  time  (EST)  of  the  current 
task  (taski)  at  each  site  (5j).  To  obtain  the  EST  value  of  task^  the  summation  of  the 
earliest  finish  time  (EFT)  and  the  communication  cost  is  calculated  for  each  immediate 
predecessor  task  of  taski  in  the  graph.  The  EFT  of  a  task  at  a  site  is  calculated  by  the 
summation  of  its  EST  value  and  the  predicted  execution  time  of  the  task  at  the  current 
site  (step  20).  As  shown  in  step  22,  the  best  site  of  a  node  is  the  one  that  minimizes  the 
EFT  value.  The  best  available  site  for  the  current  task  is  determined  at  each  iteration 
of  the  while-loop  from  step  16  to  step  25.  For  an  application  flow  graph  AFG(u,  e)  with 
V  nodes  and  e  edges  the  while-loop  takes  0{v)  to  compute  the  EST  value  of  a  node  on 
a  site  (steps  15  and  16).  We  assume  AFG  to  be  a  dense  graph  in  which  the  number  of 
edges  are  proportional  to  O(u^).  Since  there  are  v  nodes  in  AFG  and  k  sites  involved  in 
the  scheduling  process,  the  while-loop  takes  0{kv^)  time;  hence  the  time  complexity  of 
the  site  scheduler  algorithm  is  0(/cu^),  since  the  while-loop  is  the  dominant  part.  The 
value  of  k  will  be  much  smaller  than  u;  thus  the  worst  case  complexity  of  the  algorithm 
is  O(u^). 


Site^cheduler-Algorithin(AFG) 


Step 

1 

Step 

2 

Step 

3 

Step 

4 

Step 

5 

Step 

6 

Step 

7 

Step 

8 

Step 

9 

Step 

10 

Step 

11 

Step 

12 

Step 

13 

Step 

14 

Step 

15 

Step 

16 

Step 

17 

Compute  the  level  for  all  nodes  in  AFG. 

Initialize  TaskXist  according  to  a  non-increasing  order  of  node  level. 

Read  Sremote  list  and  the  updatejrequest  flag  from  resource  performance  table. 
If  updatejrequest  flag  is  true  then 

Select  k  nearest  neighbor  sites  of  Siocai  that  maximize  the  network 
bandwidth  and  store  them  in  a  set,  Sremote- 
updatejrequest  false. 

Update  Sremote^  and  updatejrequest  in  the  resource  performance  table. 

endif 

Site^ist  * —  Slocal  U  ^remote 

For  each  site  Sj  €  Site-List  do 

Send  AFG  and  Task-List  for  bidding. 

{Pred-Time{taski,  5j),  BestJResource{taski,  Sj)}  <- 

Host-Selection-Algorith7n(Task-List)  V  taski  €  Task-List, 

endfor 

Resource-AllocJTable{taski)  Sm  ,  such  that: 

Pred-Time{taski,  Sm)  ^  xmn{Pred-Time{tasku  Si)},  V  G  SiteJList, 

Remove  taski  from  the  Task-List, 
while  TaskJList  is  not  empty  do 
taski  ^  the  first  task  in  Task-List, 
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step  18  For  each  site,  5^,  in  the  Site-List  do 

Step  19  EST{taski,Sj)  max  {EFT{taskk, Sm)-^{LT{Sm, Sj)-\- )} 

V  taskk  €  Pred€cessor{taski),  such  that: 

Sm  ^  Resource-AllocJrable{taskk)- 

Step  20  EFT{taski,  Sj)  EST{taski,  Sj)  -h  PredJri7ne{taski,Sj). 

Step  21  endfor 

Step  22  Select  Best-Site,  such  that: 

EFT{taski,  Best-Site)  <—  7nin{EFT{taski,  Sj)},  ySj  6  Site-List. 
Step  23  Resource-AllocJTable{taski)  ^  BestJtesource(taski,  Best-Site) 

Step  24  Remove  taski  from  the  Task-List. 

Step  25  end  while 

Step  26  Multicast  the  Resour ce-AllocIT able  to  the  relevant  sites. 

Host  Selection  Algorithm 

The  Host  Selection  Algorithm  determines  the  task  assignments  of  AFG  tasks  on  the  avail¬ 
able  processors  within  each  site.  The  calculation  of  the  EST  is  similar  to  the  previous 
algorithm.  In  this  algorithm,  base  communication-link  bandwidth,  BWbase,  is  considered 
for  all  connections  within  a  site  (step  4).  Additionally,  the  latency  within  a  site  is  negligi¬ 
ble  if  it  is  compared  with  the  latency  between  the  different  sites.  The  communication  cost 
between  a  task  and  its  immediate  predecessor  is  zero  if  they  are  scheduled  to  the  same 
processor.  The  core  of  the  Host  Selection  Algorithm  is  the  performance  prediction  phase. 
The  execution  time  prediction  of  a  task  on  a  a  given  resource  is  based  on  the  current  load 
of  the  processor,  load  of  the  test  processor  at  the  time  of  trial  run,  measured  execution 
time  for  the  trial  run,  and  computing  weights  (step  5). 

The  measured  execution  time  and  the  load  value  for  the  trial  runs  are  retrieved  ftom 
the  task-performance  table,  as  explained  in  the  Site  Repository  section  of  this  paper. 
Weight{Pj)  is  the  computing  weight  [19,  20]  of  processor  Pj  with  respect  to  the  base- 
processor  at  the  site.  To  calculate  the  weight  of  each  processor,  trial  runs  of  a  set  of  task 
implementations  are  executed  on  each  processor.  The  ratio  of  average  execution  time  of 
the  trial  runs  on  a  processor  Pi  to  the  average  execution  time  on  the  base-processor  gives 
the  computing  power  weight  of  P^.  In  step  6,  the  EFT  value  is  the  summation  of  the  EST 
and  the  predicted  execution  time.  For  each  task,  the  processor  that  minimizes  the  EFT 
value  is  selected  as  the  best  resource  in  this  site.  An  iteration  of  the  while  loop  takes 
0{pv)  times,  where  v  is  the  number  of  nodes  in  AFG  and  p  is  the  number  of  processor  in 
the  Processor-List.  Thus  the  time  complexity  of  the  Site  Scheduler  Algorithm  is  0{pv^). 

Host-Selection-Algorithm(TaskJliist) 

Step  1  while  Task-List  is  not  empty  do 
Step  2  taski  the  first  task  in  Task-List. 

Step  3  For  each  available  processor,  Pj,  in  the  Processor-List  do 

Step  4  EST{taski, Pj)  max  {EFT{taskkyPm)+Com7n-Cost{taskkjtaski)} 
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step  5 

Step  6 
Step  7 
Step  8 

Step  9 
Step  10 


V  taskk  €  Predecessor(taski)  such  that: 

^  BestJtesource{taskk)  and 

r  D(fe,t) 

Com7n.Cost{taskk,taski)  <  BWhasc 


Pm  7^  Pj 

otherwise 


Pred.Time(taski,Pj)  ^  ^ 

ExecJ'ime{taski,Ptest)  x 

EFT{taski,  Pj)  <-  EST{taski,  Pj)  +  Pred.Time{taski,  Pj) 

endfor 

BestJlesource{taski)  <—  Pfc,  such  that: 

EFTitaski,  Pk)  <-  mm{£;FT(tasA:<,  Pj)},  VPj  6  Processor JList. 

endwhile 

Return  PredJrime{taski,  BestJtesource)  and  BestJR.esource(taski)  to  Siocai 
for  each  task. 


3.S  VDCE  Runtime  System 

The  VDCE  Runtime  System  sets  up  the  execution  environment  for  a  given  application 
and  manages  the  execution  to  meet  the  hardware/software  requirements  of  the  applica¬ 
tion.  The  VDCE  Runtime  System  separates  control  and  data  functions  by  allocating  them 
to  the  Control  Virtual  Machine  (CVM)  and  Data  Virtual  Machine  (DVM),  respectively. 
CVM  measures  the  loads  on  the  resources  (hosts  and  networks)  periodically  and  monitors 
the  resources  for  possible  failures.  CVM  daemons  control  the  execution  of  the  application 
tasks  on  the  assigned  resources  based  on  the  performance  and  quality  of  service  require¬ 
ments.  Application  visualization  (real-time  or  post-mortem)  services  are  provided  by 
CVM.  DVM  provides  an  execution  environment  for  a  given  VDCE  application  by  binding 
tasks  so  that  they  can  interact  and  communicate  efficiently.  DVM  supports  socket-based 
point-to-point  connections  for  inter-task  communications. 

Control  Virtual  Machine  (CVM) 

The  functionality  of  CVM  is  provided  by  the  following  four  processes:  Site.CVM,  Lo- 
caLCVM,  Monitor,  and  Cluster  Manager  (see  Figure  5).  Each  VDCE  machine  runs  a 
Local-CVM  process  and  a  Monitor  daemon.  Additionally,  one  of  the  machines  within 
each  cluster  executes  the  Cluster  Manager  process.  Each  site  (domain)  has  a  Site.CVM 
process  located  at  the  VDCE  Server  machine.  The  main  functions  of  the  stated  CVM 
processes  are  given  below: 

•  Retrieving  Resource  Performance  Parameters.  VDCE  resources  are  periodically 
monitored  to  collect  up-to-date  values  of  processor  and  network  parameters  that 
were  given  in  the  Site  Repository  subsection  of  this  paper.  The  Monitor  daemon 
of  each  ina/>hinft  periodically  measures  the  up-to-date  parameters  every  30  seconds 
and  updates  its  fields  at  the  Cluster  leader  machine  shown  in  Figure  5.  The  Cluster 
Manager  daemon  gathers  the  parameters  of  machines  within  the  cluster  in  a  table 
and  periodically  forwards  the  table  to  the  Site_CVM  every  60  seconds.  In  the  future 
implementation  the  Cluster  Manager  will  be  modified  to  send  only  the  workloads 
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of  the  resources  that  have  changed  considerably  from  the  previous  measurement. 
The  workload  of  a  resource  is  significantly  changed  if  the  up-to-date  measurement 
is  higher  or  lower  than  the  summation  of  the  previous  measurement  and  the  width 
of  the  confidence  interval  [22]. 

•  Updating  the  Site  Repository,  The  Site^CVM  periodically  updates  the  resource- 
performance  table  at  the  site  repository  with  the  parameters  that  are  collected  from 
Cluster  Managers.  The  execution  time  and  load  measurement  of  benchmarking  runs 
of  tasks  are  stored  at  the  task-performance  table. 

•  Monitoring  the  VDCE  Resources.  When  a  Monitor  daemon  of  a  processor  stores  its 
parameters,  it  reads  the  random  number  that  was  generated  by  the  Cluster  Man¬ 
ager  and  updates  its  alive_check  field  with  this  value.  Every  60  seconds  the  Cluster 
Manager  compares  its  alive-check  field  with  each  cluster  machine’s  alive_check  field. 
The  machines  with  a  different  value  are  marked  as  down;  others  are  marked  alive. 
After  the  comparison,  the  Cluster  Manager  assigns  a  new  random  number  for  its 
alive.check  field.  The  monitor  information  is  forwarded  to  the  Site_CVM  with  the 
resource  parameters  to  be  stored  at  the  site  repository.  The  machines  that  are 
marked  as  down  at  the  resource-performance  table  are  not  selected  by  the  Applica¬ 
tion  Scheduler. 

•  Sending  the  Related  Portion  of  the  Resource  Allocation  Table.  After  the  resource 
allocation  table  is  generated  by  the  Application  Scheduler,  the  Site..CVM  multicasts 
it  to  the  Cluster  Managers  that  will  be  involved  in  the  execution.  If  a  machine  in 
a  cluster  is  assigned  for  a  task  execution,  the  Cluster  Manager  sends  an  execution 
request  message  and  related  parts  of  the  resource  allocation  table  to  the  LocaLC  VM 
of  the  machine. 

•  Inter-site  Coordination.  As  explained  in  Section  3.2,  the  Application  Scheduler  at 
the  local  site  selects  a  subset  of  remote  sites  and  multicasts  the  application  flow 
graph  to  these  sites.  The  remote  sites  run  the  Host  Selection  Algorithm  locally  and 
transfer  the  mapping  decisions  to  the  sender  site.  The  inter-site  coordination  and 
message  transfer  are  handled  by  Site.CVMs. 

•  Initialize  the  Application  Execution  Environment  After  the  LocaI_CVM  receives  an 
execution  request  message  from  the  Cluster  Manager,  it  activates  the  DVM.  The 
DVMs  on  the  assigned  machines  set  up  the  application  execution  environment  by 
starting  the  task  executions  and  creating  point-to-point  communication  channels  for 
inter-task  data  transfer.  Figure  6  shows  the  part  of  the  execution  environment  of  the 
Linear  Equation  Solver  application  discussed  in  Section  3.1.  Machine  1  will  execute 
the  LU  JDecomposition  task,  which  is  followed  by  the  execution  of  MatrixJnversion 
tasks  on  Machine  2  and  Machine  3.  When  all  the  required  acknowledgments  are 
received,  an  execution  startup  signal  is  sent  to  start  the  application  execution. 

•  Managing  the  application  execution.  The  LocaLCVM  monitors  the  application  ex¬ 
ecution  on  the  assigned  machines  and  maintains  the  performance,  fault  tolerance, 
and  QoS  requirements  of  the  application  tasks.  If  the  current  load  on  any  of  these 
machines  is  more  than  a  predefined  threshold  value,  the  LocalXVM  terminates 
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VDCE  SERVER  MACHINE 
(SYRACUSE  UNIVERSFTY  DOMAIN) 


Figure  5:  Interactions  Among  the  Control  Virtual  Machine  Components 
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Figure  6:  Setting  Up  the  Application  Execution  Environment 

the  task  execution  on  the  machine  and  sends  a  task  rescheduling  request  to  the 
Site.CVM  through  the  Cluster  Manager. 


Data  Virtual  Machine  (DVM) 

DVM  is  a  socket-based,  point-to-point  communication  system  for  inter-task  communica¬ 
tions.  Therefore,  any  machine  that  supports  socket  programming  can  be  part  of  VDCE. 
As  shown  in  Figure  6,  the  DVM  activates  the  communication  proxy  and  sends  the  resource 
allocation  information,  including  the  socket  number,  IP  address  for  target  machine,  etc., 
that  will  be  used  for  the  communication  channel  setup.  After  the  setup  is  completed 
successfully,  the  communication  proxy  sends  an  acknowledgment  to  the  LocaLCVM.  The 
execution  startup  signal  is  sent  to  start  the  task  executions. 

On  the  other  hand,  for  a  thread-based  programming  environment,  the  Data  Manager 


consists  of  three  threads  that  are  initiated  by  the  communication  proxy:  send  thread, 
receive  thread,  and  compute  thread.  After  the  communication  channel  is  established,  the 
send  and  receive  threads  are  activated  for  data  transfer  and  the  compute  thread  performs 
the  task  execution.  The  control  transfer  between  the  LocaLCVM  and  the  DVM  (or  any 
other  control  transfer  on  the  same  machine)  are  based  on  an  inter-process  communica¬ 
tion  mechanism  (i.e.,  pipes  or  shared-memory  paradigm).  The  data  transfer  among  the 
communication  proxies  (or  between  send  and  receive  threads  for  multithreaded  systems) 
uses  a  socket-based,  message-passing  mechanism. 

Since  user  tasks  can  be  programmed  in  various  message-passing  tools,  the  VDCE 
Runtime  Sjrstem  supports  multiple  message-passing  libraries  such  as  P4,  PVM,  MPI,  NCS. 
Additionally,  the  VDCE  Runtime  System  provides  data  conversions  that  might  be  needed 
when  an  application  execution  environment  includes  heterogeneous  machines.  The  VDCE 
Runtime  System  provides  several  user-requested  services  such  as  I/O  service,  console 
service,  and  visualization  service.  A  user  can  request  these  services  while  developing 
his/her  application  with  the  Application  Editor.  I/O  Service  provides  either  file  I/O  or 
URL  I/O  for  the  inputs  of  the  application  tasks.  The  user  can  suspend  and  restart  the 
application  execution  with  the  console  service.  The  VDCE  visualization  service  provides 
both  real-time  and  post-mortem  visualizations.  There  are  three  types  of  visualizations 
provided  in  VDCE: 

•  Application  Performance  Visualization:  The  execution  time  of  tasks  in  an  applica¬ 
tion  is  visualized. 

•  Workload  Visualization:  Up-to-date  workload  information  on  VDCE  resources  is 
visualized. 

•  Comparative  Visualization:  VDCE  makes  it  possible  for  an  end  user  to  experiment 
and  evaluate  his/her  application  for  different  combinations  of  hardware  and  software 
medium  by  providing  the  comparative  performance  visualization. 


4  VDCE  Testbed:  Experimental  Results  and  Discussion 

The  current  VDCE  prototype  consists  of  two  sites,  one  at  S3nracuse  University  and  the 
other  at  Rome  Laboratory,  that  are  connected  by  the  NYNET  ATM  Wide  Area  Network, 
as  shown  in  Figure  7.  Each  site  or  domain  has  a  VDCE  server,  a  Site  Repository  and 
several  computing  clusters.  At  the  Syracuse  University  site  there  are  three  computing 
clusters:  HPDC,  CAT,  and  TOP.  The  HPDC  cluster  consists  of  several  ATM  switches 
and  ATM  concentrators  that  connect  high-performance  workstations  and  PCs  at  a  rate 
of  155  and  25  Mbps,  respectively  (URL:http//www.atm.syr.edu).  The  TOP  and  CAT 
clusters  have  SUN  SPARCs,  SUN  IPXs  and  IBM  RS6000s  that  are  connected  to  the  ATM 
cluster  through  the  Ethernet.  The  Rome  Lab  site  consists  of  three  clusters  that  include 
SUN,  Digital,  and  HP  workstations. 

In  this  section  we  discuss  and  evaluate  the  performance  of  the  current  VDCE  proto¬ 
type  in  implementing  two  important  tasks:  1)  The  use  of  VDCE  as  an  evailuation  tool 
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Figure  7:  The  configuration  of  the  VDCE  Testbed 


for  the  parallel  implementations  of  the  VDCE  library  tasks  using  different  numbers  of 
workstations,  and  different  networks  to  connect  them  (e.g,  ATM  or  Ethernet);  and  2) 
The  use  of  VDCE  as  a  problem-solving  environment  for  large-scale  VDCE  applications. 

4~1  Experiment  1:  Using  VDCE  as  a  Parallel  Evaluation  Tool 

In  this  experiment  we  used  the  matrix  multiplication  (MULT)  task  as  a  running  exam¬ 
ple  to  show  the  use  of  the  VDCE  for  experunentation  and  to  evaluate  the  performance 
of  different  configurations  when  the  number  of  computers,  network  types,  and  problem 
sizes  are  changed.  We  compared  the  time  and  effort  required  to  perform  such  tasks  with 
and  without  using  the  VDCE.  We  benchmarked  the  sequential  and  parallel  algorithms 
of  Matrbc  multiplication(MULT)  based  on  various  machine  and  network  configurations 
and  problem  sizes.  The  parallel  implementation  of  MULT  {A  x  B  =  C)  task  is  based  on 
the  host-node  programming  model.  The  master  process  distributes  the  rows  of  matrix  A 
evenly  among  the  processes  (where  each  process  runs  on  one  workstation)  while  all  the 
slave  processes  receive  the  entire  B  matrix.  Each  slave  process  computes  its  part  of  result 
matrix  C  and  sends  it  back  to  the  host  process. 

The  VDCE  provides  a  web-based,  user-friendly  interface  that  allows  a  novice  pro¬ 
grammer  to  experiment  with  and  evaluate  different  parallel  configurations  of  each  VDCE 
task  in  minutes.  We  argue  that  performing  similar  evaluation  tasks  is  almost  impossible 
for  novice  programmers  and  requires  hours  and  even  days  to  be  performed  by  an  expert 
programmer  using  parallel  processing  and  message  passing  and  visualization  tools.  V/^ith 
VDCE,  once  a  task  library  is  registered  to  the  VDCE  site  repository,  any  VDCE  user  can 
use  that  task  or  any  existing  VDCE  task  by  just  clicking  on  the  task  name  in  the  Appli¬ 
cation  Editor.  Once  the  task  is  selected,  the  user  can  click  on  one  button  to  determine 
the  problem  size,  the  number  of  computers  to  be  involved  in  the  computation,  and  the 
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network  to  be  used  to  connect  them.  Selecting  the  VDCE  task  and  specifying  how  it  will 
be  implemented  can  be  done  in  a  few  minutes.  Once  that  is  done,  the  task  configuration 
can  be  run  and  its  execution  time  visualized  immediately  without  any  effort  other  than 
clicking  on  the  execute  and  visualize  buttons. 

Figure  8  shows  the  execution  times  of  the  VDCE-based,  matrbc  multiplication  algo¬ 
rithm  for  512  X  512  and  1024  x  1024.  The  result  for  p4-based  implementation  of  the  same 
multiplication  algorithm  is  given  in  Figure  9.  The  experiments  were  done  for  one,  two 
and  four  Sun  SPARCs  that  are  connected  by  an  IP/ ATM  network.  We  also  evaluated 
the  performance  of  MULT  task  on  a  heterogeneous  cluster  of  four  SUN  SPARCs  and  four 
IBM  RS6000  workstations.  The  objective  of  such  an  evaluation  is  to  provide  users  with 
a  better  understanding  of  the  performance  of  parallel  processing  algorithms  when  there 
is  a  change  in  problem  size,  number  of  nodes,  or  network  type.  As  an  example,  for  the 
p4-based,  matrbc  multiplication  algorithm,  we  can  determine  from  Figure  9  that  eight 
nodes  provide  the  best  performance  among  the  test  cases. 
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Figure  8:  Execution  Time  of  Matrix  Multiplication  Task  Using  VDCE 

Table  2  compares  the  times  required  to  develop,  compile,  execute,  and  visualize  a 
Matrix  Multiplication  task  using  p4  and  VDCE  for  a  1024  x  1024  problem  size  with  four 
nodes.  In  the  design  and  implementation  phase,  it  takes  around  862  minutes  fiir  a  par¬ 
allel  programming  expert  to  develop  a  p4-based  multiplication  program  from  scratch  if 
we  assume  that  programming  speed  is  two  minutes  per  fine.  If  the  programmer  has  no 
experience  with  p4,  he/she  will  spend  more  time  to  learn  about  it  and  to  develop  an 
application.  For  VDCE,  even  if  the  user  does  not  have  any  knowledge  about  parallel 
programming,  but  wants  to  run  the  application  in  parallel,  the  only  thing  he/she  needs 
to  do  is  to  choose  the  parallel  option  in  the  application  design  window  of  the  Application 
Editor.  Additionally,  he/she  can  easily  define  the  I/O  for  a  task  using  the  Application 
Editor.  The  total  time  for  developing  a  VDCE  MULT  application  is  2.10  minutes. 
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Problem  Size:1024x1 024 


Figure  9:  Execution  Time  of  Matrix  Multiplication  Task  Using  p4 


Table  2 

The  performance  comparison  of  matrix  multiplication  task  for  each  software  phase 


Phase 

p4 

VDCE 

Design  and  development 

862  min. 
(431  lines) 

2.10  min. 

Compilation 

7.01  sec. 

0  sec. 

Runtime  setup 

0.980  sec. 

0.015  sec. 

Task  execution 

0.194  sec. 

0.136  sec. 

Visualization  and  evaluation 

1890sec. 

0.095  sec. 
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There  is  no  compilation  time  in  VDCE  after  the  VDCE  MULT  application  is  designed. 
The  location  of  the  executable  for  MULT  task  on  the  selected  resource  is  provided  in  the 
resource  allocation  information,  which  is  retrieved  from  the  task  constraints  table.  The 
executable  is  then  linked  to  the  I/O  module.  In  the  p4  version  the  MULT  program  takes 
7.01  seconds  for  compilation.  The  runtime  setup  time  in  VDCE  is  for  the  CVM  to  transfer 
the  activation  and  resource  allocation  information  to  DVM  and  to  wait  for  the  acknowl¬ 
edgment,  which  takes  15  milliseconds  for  the  MULT  task  on  the  selected  resource.  For 
a  p4  application,  the  user  creates  a  configuration  file,  i.e.,  procgroup  file,  and  manually 
links  it  to  the  p4  application  which  takes  980  milliseconds.  VDCE  runs  the  application 
automatically  with  the  “Execute  Application”  button  and  generates  the  results  in  the 
selected  output  file.  The  execution  time  of  MULT  task  is  136  milliseconds  when  it  is 
executed  on  four  nodes  over  the  ATM.  The  execution  time  is  194  milliseconds  using  a  p4 
program  with  the  same  configuration. 

VDCE  provides  dynamic  and  post-mortem  visualization  of  the  application.  A  VDCE 
user  monitors  the  load  of  all  machines  dynamically  in  the  domain  and  he/she  can  consider 
the  load  information  to  select  an  appropriate  machine  and/or  a  cluster.  In  addition,  the 
execution  time  of  each  module  within  an  application  is  visualized  in  VDCE.  It  takes 
95  milliseconds  to  invoke  the  VDCE  visualization  window  for  the  MULT  task.  If  a  p4 
user  wants  to  visualize  the  execution  time  to  compare  its  performance  with  others,  it  is 
necessary  to  use  another  graphic  tool.  The  visualization  and  evaluation  time  depends  on 
which  tool  is  used;  as  an  example,  “gnuplot”  takes  1890  seconds. 

Experiment  2:  Using  VDCE  as  a  Problem  Solving  Environment 

In  this  experiment  we  demonstrated  how  the  VDCE  can  enable  a  novice  programmer  to 
develop  large-scale  parallel  and  distributed  applications  running  on  geographically  dis¬ 
tributed  heterogeneous  resources.  Implementing  such  applications  is  currently  a  challeng¬ 
ing  programming  problem  and  time  consuming  for  experts  on  parallel  and  distributed 
programming  tools.  A  distributed  application  can  be  viewed  as  an  Application  Flow 
Graph  (AFG),  where  its  nodes  denote  computational  tasks  and  its  links  denote  the  com¬ 
munications  and  synchronization  between  these  nodes.  Without  an  application  develop¬ 
ment  tool,  a  developer  or  development  team  must  apply  much  effort  and  time  to  develop 
a  distributed  application  from  scratch.  The  VDCE  provides  a  web-based  interface  to 
enable  users  to  develop,  configure,  execute,  and  visualize  such  a  distributed  application 
in  a  few  minutes.  However,  to  perform  the  same  tasks  in  a  non- VDCE  case,  the  user 
or  team  developers  need  to  develop  techniques  to  interact  and  communicate  the  modules 
running  on  different  computers,  and  they  need  to  develop  or  integrate  techniques  to  run 
and  manage  the  execution  of  the  distributed  application,  as  well  as  collect  and  visualize 
the  required  performance  results. 

To  solve  these  difficulties,  VDCE  provides  an  integrated  problem  solving  environ¬ 
ment  to  enable  novice  users  to  develop  large-scale,  complex,  distributed  applications  us¬ 
ing  VDCE  tasks.  The  Linear  Equation  Solver  (LES)  application  has  been  selected  as  a 
running  example.  Figure  4  shows  the  AFG  of  Linear  Equation  Solver,  which  consists  of 
an  LU  Decomposition  (LU)  task,  two  Matrix  Inversion  (INV)  tasks  and  Matrix  Multi¬ 
plication  (MULT)  tasks.  The  problem  size  for  this  experiment  is  1024  x  1024  using  four 
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Table  3  ^ 

Performance  comparison  of  linear  equation  solver  application  for  each  software  phase 


p4 

VDCE 

Phase 

LU 

INV 

MULT 

LU 

INV 

MULT 

Design  and  development 

838  min. 

1314  min. 

862  min. 

2.10  min. 

1.57  min. 

2.30  min. 

(419  lines) 

(657  lines) 

(431  lines) 

0  sec. 

0  sec. 

Compilation 

6.45  sec. 

8.10  sec. 

7.01  sec. 

0  sec. 

Runtime  setup 

1.200  sec. 

1.580  sec. 

0.980  sec. 

0.043  sec  ^ 

0.140  sec 

Task  execution 

0.386  sec. 

0.556  sec. 

0.194  sec. 

0.801  sec. 

1.360  sec. 

Application  execution 

1.691  sec. 

1.451  sec. 

Application  visualization 

3200  sec. 

0.140  sec. 

nodes,  which  are  SUN  SPARCs  and  IBM  RS6000  machines  that  are  connected  by  an 
ATM  network. 

Table  3  compares  the  timing  of  several  software  phases  for  a  Linear  Equation  Solver 
application  using  p4  and  VDCE.  When  a  user  has  enough  knowledge  about  parallel  pro¬ 
gramming  and  the  p4,  he/she  will  spend  838  minutes  for  an  LU  task,  1314  minutes  for 
an  INV  task,  and  862  minutes  for  MULT  task.  The  total  time  to  develop  the  application 
for  a  non- VDCE  version  is  approximately  3014  minutes,  (i.e.,  around  50  hours).  Usiiig 
VDCE,  a  novice  user  spends  around  six  minutes  to  develop  such  an  application.  There  is 
no  compile  time  for  VDCE,  but  a  p4  application  needs  21  seconds  for  compilation.  The 
VDCE  setup  time  for  a  Linear  Equation  Solver  application  is  43  milliseconds.  The  p4 
user  should  create  all  procgroup  files  and  launch  them  in  order,  which  takes  around  eight 
seconds. 

Since  the  VDCE  is  based  on  the  data  flow  model  and  executes  tasks  automatically, 
there  may  be  overlap  among  task  executions  that  causes  the  total  execution  time  of  the 
VDCE  application,  including  the  setup  time,  to  be  less  than  the  summation  of  all  individ¬ 
ual  task  execution  times.  In  our  experiment  with  the  Linear  Equation  Solver  application, 
the  total  execution  time  of  p4  parallel  execution  using  four  nodes  is  1691  milliseconds. 
A  VDCE-based  execution  with  the  same  configuration  takes  1451  milliseconds,  which 
outperforms  the  p4  by  16%. 


5  ADAPTIVE  DISTRIBUTED  VIRTUAL  COMPUTIN G  EN VIRONEMNT 
(ADViCE) 

5. 1  Introduction 

With  the  proliferation  of  wireless  networks,  metacomputing  services  can  be  extended  to 
include  mobile  users  and  resources.  A  mobile  metacomputing  environment  allows  users 

Jag);  rows  of  the  table  are  for  the  total  time  of  the  application. 

^It  is  the  total  setup  time  for  a  VDCE-based  linear  equation  solver  application. 
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not  only  access  to  information  servers  from  mobile  computers,  but  also  enables  them  to 
develop,  run,  and  visualize  large  scale  parallel  and  distributed  applications  running  on 
heterogeneous  computers  that  are  connected  by  wired  and  wireless  networks. 

The  main  goal  of  the  ADViCE  project  is  to  extend  the  current  VDCE  to  support 
mobile  users  and  resources.  ADViCE  provides  a  parallel  and  distributed  programming 
environment;  it  provides  an  efficient  web-based  user  interface  that  allows  users  to  develop, 
run  and  visualize  parallel/distributed  applications  running  on  heterogeneous  computing 
resources  connected  by  wired  and  wireless  networks.  Consequently,  the  fact  that  some  of 
the  resources  are  mobile  such  as  users,  computers,  storage  devices  and  networks  become 
transparent  to  the  users  and  the  application  developers. 


5.2  Related  Work 

In  this  section  we  provide  a  brief  overview  of  the  issues  related  to  parallel  and  distributed 
programming  environments  and  mobile  computing. 

5.3  Parallel  and  Distributed  Software  Development  Issues 

The  software  development  process  of  parallel  and  distributed  applications  can  broadly  be 
described  in  terms  of  three  phases:  a)  Application  design  and  specification,  b)  Application 
scheduling  and  resource  configuration,  and  c)  Application  execution  and  runtime. 

•  Application  Design  and  Specification:  In  a  well-integrated  execution  envi¬ 
ronment  it  is  important  to  provide:  a)  an  easy-to-use  interactive  user-interface  to 
design  and  specify  parallel  distributed  applications  and,  b)  well-developed  graphical 
utilities  for  the  visualization  of  results  and  program  behavior.  Generally,  writing 
parallel  and  distributed  programs  overwhelms  users  due  to  the  difficulty  of  explic¬ 
itly  expressing  communication  and  synchronization  among  the  computations  [7]. 
A  graph-based  programming  environment,  in  which  a  program  is  defined  as  a  di¬ 
rected  graph  where  nodes  denote  computations  and  links  denote  communication  and 
synchronization  between  nodes,  may  be  used  to  decrease  the  work  of  programmers. 
Currently,  there  are  a  few  visual  parallel  programming  languages  and  environments, 
such  as  Computationally  Oriented  Display  Environment  (Code)  [11],  Heterogeneous 
Network  Computing  Environment  (HeNCE)  [12],  and  Zoom  [13].  To  develop  a  Code 
or  HeNCE  application,  a  programmer  first  expresses  the  sequential  computations  in 
a  standard  language  and  then  specifies  how  they  are  to  be  composed  into  a  parallel 
program.  Zoom  is  a  hierarchical  abstraction  for  describing  heterogeneous  appli¬ 
cations.  Zoom  representation  of  an  application  can  be  translated  into  a  HeNCE 
program  for  execution  [12].  Currently,  there  is  an  increased  interest  in  developing 
web-based  application  development  tools  and  environments  because  of  the  explosive 
use  of  internet  applications  [29]. 

ADViCE  graphical  user  interface  is  web-based  GUI  and  has  been  developed  using 
JAVA  programming  language  and  JAVA  servers. 

•  Application  Scheduling  and  Resource  Configuration  After  the  is  specified 
and  developed,  the  application  tasks  need  to  be  assigned  to  the  available  computing 
and  storage  resources.  In  the  literature,  although  the  task  scheduling  (or  resource 
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allocation)  problem  has  been  investigated  extensively,  most  of  the  algorithms  and 
systems  are  valid  only  for  specific  architectures  and/or  certain  class  of  applications. 
One  interesting  general  scheduling  framework  is  the  APPLeS  [14].  The  APPLeS 
proposes  application-level  scheduling  in  which  ali  system  aspects  are  evaluated  with 
respect  to  application  performance.  APPLeS  develops  a  customized  schedule  for 
each  application  by  including  user-specific,  application-specific,  system-specific,  and 
dynamic  information  in  its  scheduling  decision.  There  are  resource  management 
systems  to  provide  load  sharing  and  resource  allocation  such  as  the  Condor  project 
that  has  been  developed  at  the  University  of  Wisconsin  [31].  Condor  is  a  distributed 
batch  system  for  sharing  the  workload  of  compute-intensive  jobs  in  a  pool  of  UNIX 
workstations  connected  by  a  network.  In  ADViCE,  we  follow  similar  approach  to 
APPLeS,  where  for  each  parallel  and  distributed  application,  the  system  generates 
at  runtirne  an  adaptive  schedule  that  can  optimize  the  requirements  of  an  application 
such  as  performance,  fault-tolerance,  or  security. 

•  Application  Execution  and  Runtime:  The  application  execution  and  runtime 
phase  executes  the  developed  and  configured  application.  This  stage  integrates  the 
assigned  resources  that  have  been  assigned  to  run  the  application  tasks.  The  soft¬ 
ware  tools  used  for  the  execution  of  the  application  can  be  either  based  on  message¬ 
passing  tools  such  as  PVM  [23],  P4  [25],  MPI  [24],  and  NCS  [26]  or  based  on 
distributed  shared  memory  (DSM)  [3,  4,  5,  6].  In  addition,  there  are  a  few  projects 
targeted  toward  providing  a  metacomputing  environment  on  diverse  resources.  The 
earliest  metacomputer,  the  NCSA  Metacomputer  [27],  was  an  integration  of  several 
MPPs,  mass  storage  units,  visualization  and  I/O  devices.  Globus  [21],  Legion  [28], 
and  VDCE  [8,  10]  targeted  toward  the  development  of  metacomputing  environ¬ 
ments.  Additionally,  there  are  several  web-based  metacomputing  projects  [29],  that 
either  use  the  JAVA  programming  language  as  the  main  computation  language  or 
provide  a  coordination  medium  based  on  WWW  technologies  or  the  JAVA  language. 
There  may  be  some  drawbacks  to  these  methods.  First,  they  may  not  support  the 
programs  written  in  other  languages  such  as  C  and  Fortran.  Second,  they  may  sup¬ 
port  communication  only  between  a  server  and  a  client,  which  restricts  the  execution 
of  the  candidate  applications.  The  ADViCE  runtime  system  is  based  on  message 
passing  tools  and  is  implemented  using  P4  and  NCS.  We  also  using  JAVA  and  web- 
servers  to  perform  all  the  control,  management  and  visualization  furictions,  while 
we  use  C,  C-(-+,  Fortran,  and  any  other  language  to  program  the  application  tasks. 
In  other  words,  our  approach  is  open  and  can  support  any  language  to  implement 
the  application  tasks. 

5.4  Mobile  Computing  Issues 

Mobile  computing  is  increasingly  becoming  an  important  programming  environment  and 
there  has  been  very  little  research  to  address  the  programming  issues  in  such  an  envi¬ 
ronment  and  how  to  integrate  it  into  the  current  parallel  distributed  programming  envi¬ 
ronments  with  stationary  resources.  The  main  characteristics  and  constrains  of  mobile 
computing  are  [1,  2]:  1)  The  use  of  wireless  networks  make  mobile  resources  resource- 
poor  relative  to  stationary  resources  and  the  communication  performance  and  reliability 
varies  widely,  2)  Mobile  resources  complicates  the  issues  related  to  resource  locations  and 
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portability,  and  3)  Mobile  resources  rely  on  a  finite  energy  resource.  The  main  limita¬ 
tions  of  developing  mobile  parallel  and  distributed  programming  environments  include 
the  following: 

•  The  use  of  wireless  networks  implies  that  applications  will  experience  low  trans¬ 
fer  rate  and  unreliable  communication  links.  We  expect  this  limitation  to  ease  in 
the  future  as  the  use  of  wireless  technology  expand  and  more  progress  is  made  in 
increasing  the  transfer  rate  over  wireless  networks. 

•  The  current  techniques  to  support  dynamic  task  migrations  and  adaptive  resource 
configurations  are  rigid  and  can  not  run  efficiently  when  the  computing  and  storage 
resources  are  fixed  and/or  mobile.  For  example,  it  is  possible  that  some  of  the  tasks 
associated  with  a  parallel  and  distributed  application  could  be  running  on  several 
high  performance  computers  that  are  connected  by  a  fiber-optic  high  speed  network 
while  other  tasks  are  running  on  computers  that  are  connected  by  a  low  speed, 
unreliable  wireless  network.  The  performance  of  this  application  will  drastically 
affected  by  the  performance  of  the  communication  services  offered  by  the  wireless 
network. 

The  main  goal  of  the  ADViCE  prototype  is  to  integrate  stationary  parallel  and  dis¬ 
tributed  computing  environment  with  mobile  computing.  We  developed  an  efficient  ap¬ 
proach  to  support  adaptive  programming  and  services  for  both  mobile  and  stationary 
resources.  In  general,  there  are  two  extremes  for  supporting  adaptation  [1]:  1)  Make 
the  adaptation  is  entirely  the  responsibility  of  individual  applications,  and  2)  Make  the 
adaptation  is  completely  transparent  to  the  application  and  thus  must  be  supported  by 
the  system.  The  first  approach  avoids  the  need  for  system  support,  but  it  lacks  the  ability 
to  resolve  incompatible  resource  demands  of  different  applications  and  to  enforce  limits 
on  resource  usage.  The  second  approach  since  it  can  support  adaptivity  to  existing  appli¬ 
cations  so  they  can  run  on  mobile  resources  without  any  modifications.  The  adaptivity 
approach  supported  in  ADViCE  is  a  combination  of  these  two  schemes.  The  user  can 
specify  during  the  application  development  the  application  adaptivity  requirements.  The 
ADViCE  runtime  system  is  responsible  for  maintaining  the  adaptivity  requirements  of 
the  application  during  its  execution. 

5. 5  Overview  of  AD  ViCE  Architecture 

The  ADViCE  can  be  viewed  as  a  collection  of  geographically  dispersed  computational 
sites  or  domains,  each  of  which  has  its  own  set  of  ADViCE  servers  as  shown  in  Figure  10. 
In  any  ADViCE,  the  users,  fixed  or  mobile,  access  the  ADViCE  servers  (Visualization  and 
Editing  Server  (VES)  and  Control  and  Management  Server  (CMS))  to  develop  parallel 
and  distributed  applications  that  can  run  on  fixed  or  mobile  computing  resources  (see 
Figure  10).  In  ADViCE,  the  users  are  provided  with  a  seamless  parallel  and  distributed 
computing  environment  that  provides  all  the  software  tools  to  develop,  schedule,  run 
and  visualize  large  scale  parallel  and  distributed  applications.  In  other  words,  ADViCE 
supports  the  following  types  of  transparency: 

•  Access  Transparency:  The  users  can  login  and  access  all  the  ADViCE  resources 
(mobile  and/or  fixed)  regardless  of  their  locations. 
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•  Mobile  IVansparency:  ADViCE  supports  in  a  transparent  manner  mobile  and  fixed 
users  and  resources. 

•  Configuration  IVansparency:  The  resources  allocated  to  run  a  parallel  and  dis- 
tributed  application  can  be  dynamically  changed  in  a  transparent  manner;  that  is 
the  applications  or  users  do  not  need  to  make  any  adjustment  to  reflect  the  changes 
in  the  resources  allocated  to  them. 

•  Fault-Tolerance  Transparency;  The  execution  of  a  parallel  and  distributed  aj^h- 
cation  can  tolerate  failures  in  the  resources  allocated  to  run  that  application.  The 
number  of  faults  that  can  be  tolerated  depends  on  the  redundancy  level  used  to  run 
the  application. 

•  Performance  Transparency:  The  resources  allocated  to  run  a  given  parallel  and 
distributed  application  might  change  dynamically  and  in  a  transparent  manner  to 
improve  the  application  performance. 


Figure  10:  Adaptive  Changes  in  the  ADViCE  environment. 

Due  to  some  changes  in  the  network  traffic  or  failures,  it  might  be  necessary  to  move 
the  execution  environment  of  one  application  from  one  ADViCE  domain  to  another  as 
shown  in  Figure  1.  During  the  switching  from  one  ADViCE  environment  to 
one  or  more  ADViCE  servers  as  well  as  the  resources  allocated  to  run  a  given  ADViCE 
application  might  be  switched.  In  Figure  1,  when  the  application  execution  environment 
is  switched  from  ADViCEl  to  ADViCE2,  the  VES  is  changed  while  the  CMS  is  kept  the 

same  in  both  environments.  .  ,  .  i. 

Our  approach  to  implement  the  ADViCE  architecture  is  based  on  identifying  a  set 
of  servers  that  are  essential  to  provide  the  required  tools  for  any  parallel  and  distributed 
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programming  environment.  The  current  prototype  is  built  using  two  web-based  servers  as 
shown  in  Figure  2:  Visualization  and  Editing  Server  (VES)  and  Control  and  Management 
Server  (CMS).  The  ADViCE  architecture  can  be  generalized  to  more  than  two  servers. 
However,  in  our  implementation,  we  used  only  two  servers  to  simplify  the  implementation 
of  the  required  ADViCE  services.  The  VES  provides  all  the  editing  and  visualization  ser¬ 
vices  essential  for  the  application  development,  while  the  CMS  provides  all  the  services 
required  to  schedule,  control  and  manage  the  execution  of  the  application  so  it  can  dy¬ 
namically  adapt  its  execution  environment  to  maintain  its  quality  of  service  requirements. 
In  what  follow,  we  briefly  describe  the  basic  services  offered  by  the  ADViCE  servers. 


Figure  11:  The  Main  Components  of  the  ADViCE  Architecture. 


5.5.1  Visualization  and  Editing  Server  (VES) 

This  server  provides  two  main  application  development  services:  Application  Editing 
Service  (AES)  and  Application  Visualization  Service  (AVS). 

description  Application  Editing  Service  (AES) 

The  AES  is  a  web-based  graphical  user  interface  for  developing  parallel  and 
distributed  applications.  The  AES  provides  users  with  commands  to  develop 
and  run  a  new  or  an  existing  parallel  and  distributed  application.  The  main 
functions  offered  by  the  AES  are  connection  establishment  and  application 
editor. 

•  Connection  Establishment:  Before  the  end-user  connects  to  the  ap¬ 
propriate  VES,  a  default  server  is  initially  used  to  fulfill  the  logical- 
physical  mapping.  The  default  VES  will  determine  the  appropriate  VES 
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server  based  on  user’s  location  and  current  system  performance  parame¬ 
ters.  Once  the  appropriate  VES  is  identified,  then  the  authorization  and 
authentication  procedures  are  invoked  by  the  selected  VES  server  before 
the  user  is  allowed  to  use  the  ADViCE  services.  After  the  user  passes 
successfully  all  the  security  procedures,  the  AES  invokes  the  Application 
Editor  window  to  support  the  user  with  the  tools  required  to  develop 
parallel  and  distributed  applications. 

•  Application  Editor:  The  application  editor  provides  menu-driven  task 
libraries  that  are  grouped  in  terms  of  their  functionality,  such  as  matrix 
algebra  library,  command  and  control  task  library,  etc.  A  selected  task 
is  represented  as  a  clickable  and  draggable  graphical  icon  in  the  active 
editor  area.  Using  the  application  editor,  the  user  can  develop  an  Ap¬ 
plication  Flow  Graph  (AFG)  which  is  a  directed  graph  where  the  nodes 
denote  library  tasks  and  links  denote  the  communication/synchronization 
between  the  nodes.  The  application  editor  provides  also  users  with  the 
capability  to  specify  task  configuration;  that  is  whether  to  run  each  task 
in  sequential  or  in  parallel,  and  if  in  parallel  how  many  nodes  to  execute 
that  task  (see  Figure  12). 


Figure  12:  An  Application  Flow  Graph  Example. 


description  Application  Visualization  Service  (AVS) 

This  service  enables  the  user  to  visualize  the  application  execution  time  and 
system  runtime  parameters.  For  example.  Figure  13  shows  the  execution  time 
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for  each  task  in  the  application  shown  in  Figure  12.  In  addition,  the  AVS 
shows  the  total  execution  time  of  the  application  and  the  setup  time  of  the 
application  execution  environment. 


Figure  13:  The  Performance  of  each  Application  Task. 


5.5.2  Control  and  Management  Server  (CMS) 

The  main  services  of  the  CMS  include  Application  Resource  Service  (ARS),  Application 
Management  Service  (AMS),  Application  Control  Service  (ACS),  and  Application  Data 
Service  (ADS).  In  addition,  the  CMS  maintains  two  databases  (see  Figure  2):  one  to  store 
the  configuration  and  status  information  about  the  resources  available  in  an  ADViCE  do¬ 
main  (a  domain  is  a  distributed  computing  environment  controlled  by  one  organization 
or  an  administration),  and  one  database  to  store  the  task  performance  information  (e.g. 
execution  times  of  each  ADViCE  library  task  on  different  computing  platforms).  The  task 
performance  database  is  used  to  estimate  the  task  execution  time  on  different  computing 
platforms  and  is  used  by  the  ARS  to  optimize  the  allocation  of  resources  to  application 
tasks. 

description  Application  Resource  Service  (ARS) 

The  main  functions  of  the  ARS  is  to  interpret  the  application  flow  graph  gen¬ 
erated  by  the  AES  and  then  allocates  resources  to  the  application  tasks  to 
optimize  certain  criterion  such  as  performance,  fault-tolerance,  or  any  other 
requirements  specified  by  the  user.  The  main  functions  of  the  ARS  include 
Performance-based  Scheduling,  Security-based  Scheduling,  and  Fault  Tolerance- 
based  Scheduling.  The  performance-based  scheduling  determines  the  mapping 
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of  tasks  to  resources  that  will  maximize  the  application  performance,  while 
the  security-based  scheduling  allocates  to  the  application  tasks  only  the  re¬ 
sources  that  meet  that  application  security  requirements.  Similarly,  the  fault 
tolerance  based  scheduling  allocates  redundant  resources  to  run  each  applica¬ 
tion  task  such  that  the  application  execution  can  tolerate  certain  number  of 
failures  in  the  resources  allocated  to  execute  the  application.  In  addition,  the 
ARS  provides  application  rescheduling  capability  in  order  to  reallocate  some 
of  the  application  tasks  whose  executions  have  been  interrupted  due  to  some 
changes  in  network  and  system  resources;  these  changes  could  be  triggered  be¬ 
cause  of  the  mobility  of  resources  or  software/hardware  failures  in  the  ADViCE 
resources. 

description  Application  Management  Service(AMS)  The  AMS  utilizes 
standard  management  functions  to  control  and  manage  the  execution  of  par¬ 
allel  and  distributed  applications.  The  AMS  provides  ARS  with  management 
information  about  ADViCE  resources  to  optimize  the  allocation  of  application 
tasks  to  the  currently  available  ADViCE  resources.  The  AMS  also  provides 
a  well  defined  interface  that  enables  other  software  modules  (e.g.  ARS,  ACS, 
ADS)  to  access  any  management  information  required  to  achieve  real-time 
adaptive  services. 

description  Application  Control  Service  (ACS) 

The  ACS  provides  applications  with  the  required  services  to  setup,  run,  con- 
trol  and  manage  their  execution  within  the  ADViCE.  The  main  ACS  functions 
include  setting  up  the  application  execution  environment,  monitoring  the  ap¬ 
plication  execution,  and  collecting  the  task  performance  information  required 
for  the  visualization  of  the  application  execution.  In  setting  up  the  application 
execution  environment,  the  ACS  launches  a  proxy  process  (we  refer  to  as  the 
local-ACS)  at  each  machine  selected  for  the  application  execution  according 
to  the  Allocation  Channel  Table  (ACT)  generated  by  the  ARS.  This  involves 
setting  up  socket  connections  between  the  CMS  and  the  client  machines.  The 
local-ACS  periodically  updates  the  task  performance  database  and  notifies  the 
CMS  of  any  runtime  errors. 

description  Application  Data  Service  (ADS) 

The  ADS  provides  services  to  establish  high  speed  communication  data  paths 
between  the  application  tasks.  In  addition,  ADS  supports  limited  task  manage¬ 
ment  functions  such  as  data  conversion,  task  migration,  handling  user  request 
exception,  and  periodically  monitoring  the  task  performance. 

5.6  ADViCE  Adaptation  Approach 

One  important  goal  of  the  ADViCE  is  to  deliver  an  adaptive  parallel  and  distributed  com¬ 
puting  environment  that  can  automatically  modify  its  configuration  based  on  the  changes 

in  the  environment.  These  changes  could  be  due  to  failures  in  hardware,  software  failure, 
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mobility  of  resources,  or  bursty  network  traffic.  The  ADViCE  adaptation  approach  fol¬ 
lows  three  important  phases  or  steps:  1)  Change  Detection,  2)  Analysis  and  Verification, 
and  3)  Adaptation  Plan.  This  approach  is  similar  to  the  adaptation  approach  proposed 
to  achieve  fault  tolerance  distributed  computing  [32],  For  each  ADViCE  service  (AES, 
AVS,  ARS,  ACS,  ADS),  we  develop  the  appropriate  algorithms  to  detect  the  changes  in 
the  service  once  it  occurs,  to  analyze  and  verify  the  detected  changes  in  the  service,  and 
finally  carry  out  the  steps  defined  in  the  adaptation  plan  associated  with  that  service. 
Figures  14,  15,  16,  and  17  show  the  ADViCE  Adaptation  Algorithm  and  procedures. 

The  Application  Execution  Environment  {AE{Appi))  denotes  all  the  resources  allo¬ 
cated  to  run  application  Appi.  While  the  application  is  running  (Step  1  in  the  ADViCE 
Adaptation  Algorithm  of  Figure  14),  the  ACS  monitors  all  the  ADViCE  services  (Steps  2 
through  26  in  the  ADViCE  Adaptation  Algorithm  of  Figure  14)  associated  with  that  ap¬ 
plication  to  detect  any  possible  changes  or  deterioration  in  the  application  performance. 
Once  any  change  is  detected,  the  change  detection  procedure  associated  with  the  service 
that  has  experienced  the  changes  is  invoked  (Steps  4,  10,  16,  and  22  in  the  ADViCE 
Adaptation  Algorithm  of  Figure  14).  For  example,  assume  during  the  application  devel¬ 
opment,  the  mobile  user  has  experienced  an  excessive  delay  because  the  AES  service  is 
running  on  a  VES  server  that  is  outside  the  current  location  of  the  mobile  user.  This 
is  detected  when  the  AES  monitoring  routine  discovers  that  the  communication  delay  to 
the  VES  server  is  larger  than  the  acceptable  Dmax  (Step  1  in  Change_Detection_AES  of 
Figure  15).  Once  that  delay  is  detected,  the  Verification  and  Analysis  procedure  for  that 
service  is  invoked  (Step  6  in  the  ADViCE  Adaptation  Algorithm  of  Figure  14).  In  a 
similar  manner,  we  device  detection  algorithms  for  each  service  offered  by  the  ADViCE 
servers  (VES  and  CMS)  as  shown  in  Figure  16. 

The  Verification  and  Analysis  procedures  shown  in  Figure  16  involves  analyzing  the 
current  state  of  the  system  resources  by  using  the  AMS  services  to  validate  and  identify 
accurately  the  event(s)  that  contributed  to  the  changes  if  they  were  proven  to  be  true 
and  not  false  or  transient.  For  example,  if  the  change  detection  procedure  of  the  ADS 
has  determined  the  EventType  to  be  “link  failure”  (Step  4  in  ChangeJDetection^ADS 
of  Figure  15).  This  event  could  be  caused  by  the  machine  being  down  or  task  failure 
(Step  11  through  18  in  Verification-Analysis^DS  of  Figure  16).  The  verification  and 
analysis  could  be  simply  reading  the  OperStatus  in  the  interface  MIB  associated  with 
each  communication  link  used  for  the  inter-task  communications.  If  the  status  of  is  found 
to  be  caused  by  machine  failure,  then  the  EventCause  is  assigned  as  “machine  down”  and 
then  the  Adaptation  Plan  associated  with  ADS  is  invoked  as  shown  in  Figure  14  (Step 
25). 

The  Adaptation  Plan  procedures  involves  taking  the  appropriate  actions  to  enable 
the  ADViCE  to  adapt  to  the  changes  that  have  been  detected  and  verified.  The  adapta¬ 
tion  plan  procedure  invoke  the  appropriate  operations  associated  with  the  adaptation  of 
each  service.  For  example,  the  adaptation  plan  for  the  ADS  associated  with  “task  down” 
could  be  to  restart  the  application  execution  from  the  beginning  (Step  17  in  Adapta- 
tion_Planj^DS  of  Figure  17). 
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procedure  ADViCE-Adaptation.Algorithm 
while  (  AE{Appi)  is  running  )  do  { 
monitor  ADViCEJServices 
monitor  AES 

EventType  Change-Detection-AES() 
if  EventType  7^  Normal 

EventCause  Verification_Analysis_AES(£?t;€ntTvpe,  AEyAppi)) 
Adaptation-Plan_AES( Event C ause^  AE^Appi)) 
endif 

monitor  AVS 

EventType  Change-Detection_AVS() 
if  EventType  ^  Normal 

EventCause  Verification-Analysis-AVS (EventType,  AE(Appi)) 
Adaptation-Plan-AVS (EventCause,  AE{Appi)) 
endif 

monitor  ACS 

EventType  <—  Change-.Detection_ACS() 
if  EventType  ^  Normal 

EventCause  ^  Verification_Analysis>ACS(EventType,  AE(Appi)) 
Adaptation-Plan JVCS {EventCause,  AE{Appi ) ) 
endif 

monitor  ADS 

EveniType  <-  Change-Detection-ADS() 
if  EventType  /  Normal 

EventCause  ^  VerificationJVnalysisJVDS(Eventrype,  AE{Appi)) 
Adaptation-Plan-ADS (EventCause,  AE(Appi)) 
endif 

}  endwhile 

end  of  ADViCE_Adaptation_Algorithm 


Figure  14;  ADViCE  Adaptation  Algorithm 
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procedure  Change-Detection-AES() 

1  ^connect{V ES)  >  ^max 

2  EventType  =  unacceptable  delay  to  VES 

3  else  if  unable  to  locate  V ES 

4  EventType  =  VES  down 

5  else  if  unable  to  locate  the  database  server 

6  EventType  =  database  down 


7  else 

8  EventType  =  Normal 

9  endif 

10  Tet\ixxi{EventType) 

end  of  Change^Detection-AES 


procedure  Change-Detection_ADS() 

1  if  inter  task  communication  delay  >  Dmax 

2  EventType  =  inter  task  communication  delay 

3  else  if  broken  pipe  detected 

4  EventType  =  link  failure 


5  else 

6  EventType  —  Normal 

7  endif 

8  re\.\xvTi{EventType) 

end  of  Change-Detection-ADS 


Figure  15:  ADViCE  Change  Detection  Procedures 
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procedure  Verification-Analysis JVES(£?tjeTitType,  AE(Appi)) 

1  case  EventType  =  unacceptable  delay  to  V ES 

2  verify  delay  to  VES 

3  if  measure  the  delay  to  V ES  >  Dmax 

4  check  if  the  delay  is  caused  by  the  location  of  V ES 

5  EventCause  =  location  change  of  VES 

6  check  if  the  delay  is  caused  by  the  location  of  the  user 

7  EventCause  =  user’s  location  change 

8  check  if  the  delay  is  caused  by  heavy  load  VES 

9  EventCause  =  heavily  loaded  VES 

10  endif 

11  case  EventType  =  VES  down 

12  verify  VES  down  by  AMS  MIB 

13  if  true 

14  check  if  VES  down  is  caused  by  the  VES  machine  failure 

15  EventCause  =  VES  machine  down 

16  endif 

17  case  EventType  =  database  down 

18  verify  database  down  by  AMS  MIB 

19  if  true 

20  check  if  database  down  is  caused  by  database  machine  down 

21  EventCause  =  Application  Repository  database  machine  down 

22  check  if  database  down  is  caused  by  database  server  down 

23  EventCause  =  Application  Repository  database  server  down 

24  endif 


25  xeturn^EventCause) 

end  of  Verification_Analysis-AES 


procedure  VerificationJ\.nalysis-ADS(EuentType,  AE{Appi)) 

1  case  EventType  —  inter  task  communication  delay 

2  verify  the  communication  delay 

3  if  measure  inter  task  delay  >  Dmax 

4  check  if  the  delay  is  caused  by  heavy  network  traffic 

5  EventCause  =  heavy  traffic 

6  check  if  the  delay  is  caused  by  heavy  load 

7  EventCause  =  heavy  CPU  load 

8  endif 

9  case  EventType  =  link  failure 

10  verify  link  failure  by  AMS  MIB 

11  if  true 

12  check  if  link  failure  is  caused  by  machine  down 

13  EventCause  =  machine  down 

14  check  if  link  failure  is  caused  by  task  down 

15  EventCause  =  task  down 

16  endif 

17  xeturn{EventCause) 

end  of  Verification-Analysis-ADS 


Figure  16:  Verification  and  Analysis  Procedures 
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procedure  Adaptation_Plan-AES(JSt;entCause,  AE{Appi)) 

1  case  EventCause  =  location  change  of  VES  or 

2  EventCause  =  user’s  location  change  or 

2  EventCause  =  heavily  loaded  V ES  or 

3  EventCause  =  VES  machine  down 

4  I  access  the  default  VES 

5  I  locate  a  new  VES 

6  I  transfer  the  information  from  current  VES  to  a  new  VES 

7  case  EventCause  =  Application  Repository  database  machine  down 

8  I  choose  alternative  Application  Repository  database 

9  case  EventCause  =  Application  Repository  database  server  down 

10  I  start  the  database 

end  of  Adaptation>Plan_AES 


procedure  Adaptation_Plan-A.DS(Et;entCause,  AE{Appi)) 
case  EventCause  =  heavy  traffic  or 
EventCause  =  heavy  load  or 
EventCause  =  machine  down 
I  invoke  ARS  to  assign  a  new  machine 
I  if  migration  required 
I  task  migration 
j  endif 

I  if  partial  recovery  is  possible 
I  resume  from  the  stopped  task 
I  else 

I  resume  from  the  task  check  pointed  state 
j  endif 

case  EventCause  =  task  down 

I  if  partial  recovery  is  possible 
I  resume  from  the  task  check  pointed  state 
j  else 

I  start  the  application  from  the  beginning 
endif 


end  of  Adaptation-Plan^DS 


Figure  17:  Adaptation  Plan  Procedures 
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6  ADViCE  Ttestbed:  Experimental  Results  and  Discussion 


The  current  ADViCE  prototype  consists  of  two  sites,  one  at  Syracuse  University  and  the 
other  at  Rome  Laboratory,  that  are  connected  by  the  OC3  ATM  Wide  Area  Network,  as 
shown  in  Figure  18.  We  are  currently  setting  up  a  new  site  at  the  University  of  Arizona. 
Each  site  or  domain  has  two  ADViCE  servers  that  manage  the  computing  and  network 
resources  available  in  their  site.  At  the  Syracuse  University  site  there  are  three  computing 
clusters:  HPDC,  CAT,  and  TOP.  The  HPDC  cluster  consists  of  several  ATM  switches 
and  ATM  concentrators  that  connect  high-performance  workstations  and  PCs  at  a  rate 
of  155  and  25  Mbps,  respectively  (URL:http//www.atm.syr.edu).  The  TOP  and  CAT 
clusters  have  SUN  SPARCs,  SUN  IPXs  and  IBM  RS6000s  that  are  connected  to  the  ATM 
cluster  through  an  Ethernet  network.  The  Rome  Lab  site  consists  of  three  clusters  that 
include  SUN,  Digital,  and  HP  workstations. 


LLLU 


SUN 


LJiJJ 


/  :  SUN  Cluster 

SU  DcMTiain  Romo'Lab  Dc^in 


Figure  18:  The  configuration  of  the  current  ADViCE  Testbed. 

In  this  section  we  discuss  and  evaluate  the  performance  of  three  important  functions 
supported  by  the  ADViCE  prototype:  1)  Task  Performance  Evaluation  Tool,  2)  Problem- 
Solving  Environment,  and  3)  Adaptation  Support. 

6,1  ExpBTiTTiCTit  1*  Using  ADViCE  ns  n  Pnvnllcl  Evnluntiou  Tool 

In  this  experiment  we  used  the  matrix-vector  multiplication  (MULT-V)  task  as  a  running 
example  to  evaluate  the  use  of  the  ADViCE  prototype  as  an  evaluation  tool  to  analyze  the 
performance  of  different  configurations  when  the  number  of  computers,  network  types. 


and  problem  sizes  are  changed.  We  compared  the  time  and  effort  required  to  perform  such 
tasks  with  and  without  using  the  ADViCE  prototype.  We  benchmarked  the  sequential 
and  parallel  algorithms  of  matrix-vector  multiplication  (MULT-.V)  based  on  various  ma¬ 
chine  and  network  configurations  and  problem  sizes.  The  parallel  implementation  of  the 
MULT_V  {Ax  B  —  C)  task  is  based  on  the  host-node  programming  model.  The  master 
process  distributes  the  rows  of  matrix  A  evenly  among  the  processes  (where  each  process 
runs  on  one  workstation)  while  all  the  slave  processes  receive  the  entire  B  matrix.  Each 
slave  process  computes  its  part  of  the  result  matrix  C  and  sends  it  back  to  the  host  process. 

The  ADViCE  provides  a  web-based,  user-fi:iendly  interface  that  allows  a  novice  pro¬ 
grammer  to  experiment  with  and  evaluate  different  parallel  configurations  of  each  AD¬ 
ViCE  task  in  a  few  minutes.  We  argue  that  performing  similar  evaluation  tasks  is  almost 
impossible  for  novice  programmers  and  requires  hours  and  even  days  to  be  performed 
by  an  expert  programmer  in  parallel  processing,  message  passing  and  visualization  tools. 
Using  ADViCE  prototype,  once  a  task  is  registered  in  the  ADViCE  task  library,  the  user 
can  use  that  task  or  any  other  library  tasks  by  just  clicking  on  the  task  name  in  the 
Application  Editor  window.  Once  the  task  is  selected,  the  user  can  specify  the  desirable 
configuration  to  run  the  selected  task;  specify  the  number  of  computers  to  be  involved  in 
the  computation,  and  the  network  to  be  used  to  connect  them  if  the  task  is  going  to  run 
in  parallel.  Selecting  the  ADViCE  task  and  specifying  how  it  will  be  implemented  can  be 
done  in  a  few  minutes.  Once  that  is  done,  the  task  configuration  can  be  executed  and  its 
execution  time  can  be  visualized  immediately  without  any  effort  other  than  clicking  on 
the  execute  and  visualize  buttons  in  the  Application  Editor  window. 

Figure  19  shows  the  execution  times  of  a  matrix  multiplication  algorithm  for  two 
problem  sizes,  512  x  512  and  1024  x  1024.  The  result  for  a  p4-based  implementation  of 
the  same  multiplication  algorithm  is  given  in  Figure  20.  The  experiments  were  done  for 
one,  two  and  four  Sun  SPARCs  that  are  connected  by  an  IP/ ATM  network.  We  also 
evaluated  the  performance  of  the  MULT.V  task  on  a  heterogeneous  cluster  of  four  SUN 
SPARCs  and  four  IBM  RS6000  workstations.  The  objective  of  such  an  evaluation  is  to 
provide  users  with  a  better  understanding  of  the  performance  of  parallel  algorithms  when 
there  is  a  change  in  problem  size,  number  of  nodes,  or  network  type.  As  an  example, 
for  the  p4-based  implementation  of  the  matrix-vector  multiplication  algorithm,  we  can 
determine  from  Figure  20  that  eight  nodes  provide  the  best  performance  among  the  test 
cases. 


Table  4  compares  the  times  required  to  develop,  compile,  execute,  and  visualize  the 
Matrix- Vector  Multiplication  task  using  p4  and  the  ADViCE  prototype  for  a  1024  x  1024 
problem  size  with  four  nodes.  In  the  design  and  implementation  phase,  it  takes  around 
862  minutes  for  a  parallel  programming  expert  to  develop  a  p4-based  multiplication  pro¬ 
gram  from  scratch  if  we  assume  that  programming  speed  is  two  minutes  per  line.  If  the 
programmer  has  no  experience  with  p4,  he/she  will  spend  more  time  to  learn  the  tool  and 
then  implement  the  parallel  algorithm.  For  the  ADViCE,  even  if  the  user  does  not  have 
any  knowledge  in  parallel  programming,  but  wants  to  run  the  application  in  parallel,  the 
only  thing  he/she  needs  to  do  is  to  choose  the  parallel  option  in  the  task  configuration 
window  of  the  Application  Editor.  The  total  time  for  developing  the  ADViCE  MULT.V 
application  is  2.10  minutes  rather  than  862  minutes  if  one  needs  to  develop  the  application 
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Figure  19:  The  Performance  of  the  ADViCE  Implementation  of  the  Matrix- Vector  Mul- 
tiplication. 
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Figure  20:  The  Performance  of  the  P4  Implementation  of  the  Matrix- Vector  Multiplica¬ 
tion. 
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Table  4 

The  performance  comparison  of  the  matrix-vector  multiplication  task  for  each  software  development  phase 


Phase 

p4  ADViCE 

Design  and  development 

862  min.  2.10  min. 

(431  lines) 

Compilation 

7.01  sec.  0  sec. 

Runtime  setup 

0.980  sec.  0.015  sec. 

Task  execution 

0.194  sec.  0.136  sec. 

Visualization  and  evaluation 1890sec.  0.095  sec. 

from  a  scratch. 

The  location  of  the  executable  for  the  MULT.V  task  on  the  selected  resource  is  pro¬ 
vided  in  the  resource  allocation  information,  which  is  retrieved  from  the  task  constraints 
table.  The  executable  is  then  linked  to  the  I/O  module.  In  the  p4  version  the  MULT.V 
program,  it  takes  7.01  seconds  for  compilation.  The  runtime  setup  time  for  the  ADViCE 
prototype  consists  of  the  time  it  takes  the  ACS  to  transfer  the  activation  and  resource 
allocation  information  to  the  ADS  and  the  time  for  the  acknowledgment.  This  setup  time 
takes  0.015  seconds  for  the  MULT_V  task  on  the  selected  resource.  For  a  p4  application, 
the  user  creates  a  configuration  file,  i.e.,  the  procgroup  file,  and  manually  links  it  to  the 
p4  application  which  takes  0.98  seconds.  ADViCE  runs  the  application  automatically 
with  the  “Execute  Application”  button  and  generates  the  results  in  the  selected  output 
file.  The  execution  time  of  the  MULT-V  task  is  0.136  seconds  when  it  is  executed  on  four 
nodes  over  the  ATM.  The  execution  time  is  0.194  seconds  using  a  p4  program  with  the 
same  configuration. 

In  addition,  ADViCE  provides  dynamic  and  post-mortem  visualization  of  the  applica¬ 
tion.  The  user  can  visualize  the  loads  of  all  the  machines  in  one  domain  and  can  even  focus 
on  the  load  information  for  the  machines  selected  to  run  a  given  application.  Further¬ 
more,  the  execution  time  of  each  module  within  an  application  is  visualized  in  ADViCE. 
It  takes  0.095  seconds  to  invoke  the  ADViCE  visualization  window  for  the  MULT-V  task. 
If  a  p4  user  wants  to  visualize  the  execution  time  to  compare  its  performance  with  others, 
it  is  necessary  to  use  another  graphic  tool.  The  visualization  and  evaluation  time  depends 
on  which  tool  is  used;  as  an  example,  “gnuplot”  takes  1890  seconds. 

6.2  Experiment  2:  Using  ADViCE  as  a  Problem  Solving  Environment 

In  this  experiment  we  demonstrate  how  the  ADViCE  can  enable  a  novice  programmer 
to  develop  large-scale  parallel  and  distributed  applications  running  on  geographically 
distributed  heterogeneous  resources.  Implementing  such  applications  is  currently  a  chal¬ 
lenging  programming  problem  and  time  consuming  for  even  experts  in  parallel  and  dis¬ 
tributed  programming  tools.  A  distributed  application  can  be  viewed  as  an  Application 
Flow  Graph  (AFG),  where  its  nodes  denote  computational  tasks  and  its  links  denote 
the  communications  and  synchronization  between  these  nodes.  Without  an  application 
development  tool,  a  developer  or  development  team  must  apply  much  effort  and  time 
to  develop  a  distributed  application  from  a  scratch.  To  solve  these  difficulties,  ADViCE 
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Srforlance  comparison  of  the  linear  equation  solver  application  for  each  software  development  phase  ® 


p4  ADViCE 

LU  INV  MULT-V  LU  INV  MULT-V 

Design  and  development  838  min.  1314  min.  862  min.  2.10  min.  1.57  min.  2.30  min. 
(419  lines)  (657  lines)  (431  lines) 

Compilation  6.45  sec.  8.10  sec.  7.01  sec.  0  sec.  0  sec.  0  sec. 

Runtime  setup  1.200  sec.  1.580  sec.  0.980  sec.  0.043  sec 

Task  execution  0.386  sec.  0.556  sec.  0.194  sec.  0.801  sec.1.360  sec.0.140  sec 


Application  execution  1.691  sec. 

Application  visualization  3200  sec. 


1.451  sec. 
0.140  sec. 


provides  an  integrated  problem  solving  environment  to  enable  novice  users  to  develop 
large-scale,  complex,  distributed  applications  using  ADViCE  tasks.  The  Linear  Equation 
Solver  (LES)  application  has  been  selected  as  a  running  example.  Figure  12  shows  the 
AFG  of  the  Linear  Equation  Solver,  which  consists  of  an  LU  Decomposition  (LU)  task, 
two  Matrix  Inversion  (INV)  tasks  and  Matrix- Vector  Multiplication  (MULT.V)  tasks. 
The  problem  size  for  this  experiment  is  1024  x  1024  and  its  execution  environment  con¬ 
sists  of  four  nodes,  which  are  SUN  SPARCs  and  IBM  RS6000  machines  that  are  connected 
by  an  ATM  network. 

Table  5  compares  the  timing  of  several  software  phases  for  the  Linear  Equation  Solver 
application  using  p4  and  ADViCE.  When  a  user  has  enough  knowledge  about  parallel  pro¬ 
gramming  and  the  p4  tool,  he/she  will  spend  838  minutes  for  an  LU  task,  1314  minutes 
for  an  INV  task,  and  862  minutes  for  MULT-V  task.  The  total  time  to  develop  this  appli¬ 
cation  is  approximately  3014  minutes,  (i.e.,  around  50  hours).  Using  ADViCE,  a  novice 
user  spends  around  sbc  minutes  to  develop  such  an  application.  There  is  no  compile  time 
in  ADViCE,  but  a  p4  application  needs  21  seconds  for  compilation.  The  ADViCE  setup 
time  for  a  Linear  Equation  Solver  application  is  0.043  seconds.  The  p4  user  should  create 
all  procgroup  files  and  launch  them  in  order,  which  takes  around  eight  seconds. 

Since  the  ADViCE  is  based  on  the  data  flow  model  and  executes  the  application  tasks 
concurrently,  the  application  execution  time,  including  the  setup  time,  is  less  than  the 
summation  of  all  the  individual  task  execution  times.  In  our  experiment  with  the  Linear 
Equation  Solver  application,  the  total  execution  time  of  the  p4  implementation  using  four 
nodes  is  1.691  seconds.  The  ADViCE  implementation  of  the  same  application  with  the 
same  configuration  is  approximately  1.451  seconds. 

6.3  Expenment  3:  Evaluation  of  the  ADViCE  Adaptation  Approach 

One  of  the  main  features  of  the  ADViCE  prototype  is  its  transparent  adaptation  support. 
In  this  experiment,  we  evaluate  the  performance  of  the  ADViCE  prototype  to  develop  a 

last  two  rows  of  the  table  are  for  the  total  time  of  the  application. 

'‘It  is  the  total  setup  time  for  a  ADViCE-based  linear  equation  solver  application. 
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fault  tolerant  distributed  application  that  is  shown  in  (Figure  21). 
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Figure  21:  An  Example  of  Fault  Tolerant  Distributed  Application. 

After  a  user  develops  an  application  using  the  ADViCE  Application  Editor  window 
(AES)  and  specifies  that  the  application  tasks  should  tolerate  link  and  machine  failures. 
During  the  application  execution,  we  manually  kill  the  process  running  one  of  the  applica¬ 
tion  tasks,  say  the  INV  task,  as  shown  in  Figure  21.  The  INV  task  failure  is  immediately 
detected  by  the  Local  ACS  that  continuously  monitoring  the  execution  of  the  of  the  INV 
task  (Step  1  in  Detection  and  Analysis  Phase  of  Figure  22).  The  error  message  is  reported 
to  the  Server  ACS  running  on  the  CMS  (Step  V  and  Step  2).  The  next  step  is  to  invoke 
the  Verification-Analysis^DS  procedure  that  is  running  on  the  Server  ACS  of  the  CMS 
(Step  3)  that  determines  that  the  EventCause  is  “Task  down”  (Step  15  in  the  Verifi- 
cation-Analysis^DS  of  Figure  16,  Once  that  is  determined,  the  Adaptation-Plan_ADS 
procedure  is  invoked.  A  simple  recovery  procedure  could  be  to  restart  all  the  application 
tasks  (LU,  INV,  and  MULT.V).  This  recovery  procedure  involves  invoking  the  ARS  to 
reschedule  resources  to  the  application  (see  step  1  in  Adaptation  Phase  of  Figure  22). 
Once  the  ARS  schedules  the  application  tasks  and  passes  it  to  the  Server  ACS  (Step  2), 
the  Server  ACS  setups  the  new  application  execution  environment  by  starting  the  Local 
ACS  on  each  machine  selected  to  run  the  application  (Step  3).  Once  that  is  done,  the 
Local  ACS  starts  the  task  execution  on  its  machine  (Step  4). 

The  performance  of  the  adaptation  algorithm  depends  on  the  the  Change  Detection 
Time  (CDT),  Verification  and  Analysis  Time  (VAT),  and  Adaptation  Plan  Time  (APT), 
The  CDT  measures  the  time  it  takes  ADViCE  to  detect  the  change  event  in  any  of 
ADViCE  services.  The  VAT  measures  the  time  it  takes  ADViCE  to  verify  the  change 
event  and  determine  its  cause  type.  The  APT  measures  the  time  it  takes  ADViCE 
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Detection  and  Analysis 


Adaptation 


Figure  22:  An  Example  of  the  AD  VICE  Adaptation  Algorithm. 


to  perform  the  operations  specified  in  the  adaptation  plan  associated  with  the  affected 
service.  For  the  example  shown  in  Figure  22,  the  CDT  is  7.675  seconds,  VAT  is  5.328 
seconds  and  APT  is  18.451  seconds.  We  are  currently  evaluating  different  techniques  to 
achieve  efficient  implementations  of  all  the  procedures  identified  in  the  three  phases  of 
the  ADViCE  adaptation  algorithm. 


7  Conclusion 

^Ve  have  presented  the  design  and  evaluation  of  the  Virtual  Distributed  Computing  En¬ 
vironment  (VDCE)  and  the  Adaptive  Distributed  Virtual  Computing  Environment  that 
have  been  developed  at  Syracuse  University  and  the  University  of  Arizona. 

The  VDCE  consists  of  three  main  modules:  Application  Editor,  Application  Sched¬ 
uler,  and  VDCE  Runtime  System.  The  Application  Editor  provides  users  with  all  the 
software  tools  and  library  functions  required  to  develop  a  VDCE  application.  The  main 
function  of  the  Application  Scheduler  is  the  initial  task-to-resource  mapping  and  any  nec¬ 
essary  dynamic  rescheduling.  The  VDCE  Runtime  System  is  based  on  the  Control  Virtual 
Machine  (CVM)  and  the  Data  Virtual  Machine  (DVM).  CVM  provides  a  seamless  inter¬ 
connection  of  the  resources  and  monitors  the  resources.  DVM  enables  a  high-performance 
communication  medium  among  the  application  tasks. 
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We  have  successfully  implemented  a  proof-of-concept  prototype  that  supports  all 
major  components  of  the  VDCE  architecture.  We  are  currently  working  on  extending  the 
current  prototype  in  several  ways:  a)  develop  and  implement  an  application  programming 
interface  (API)  that  enables  users  to  add  VDCE  library  tasks;  b)  add  more  sites  to  increase 
the  computing  services  offered  by  VDCE;  and  c)  develop  and  integrate  mobile  computing 
technology  into  VDCE  so  that  users  can  access  VDCE  resources  using  mobile  hosts  and 
mobile  interconnection  networks. 

We  have  also  extended  the  VDCE  prototype  to  support  mobile  computing  and  com¬ 
munication  resources  by  developing  the  ADViCE  prototype.  The  ADViCE  consists  of 
two  main  servers:  Visualization  and  Editing  Server  (VES)  and  Control  and  Management 
Server  (CMS).  These  two  servers  provide  all  the  services  required  to  develop  parallel 
and  distributed  applications,  run,  control,  manage,  and  visualize  the  execution  of  these 
applications.  We  have  successfully  implemented  a  proof-of-concept  prototype  of  the  AD¬ 
ViCE  architecture  that  provides  most  of  the  ADViCE  services.  We  also  presented  our 
experimental  results  and  evaluation  of  the  utility  of  the  services  supported  by  the  AD¬ 
ViCE  prototype  to  achieve  efficient  and  seamless  parallel  and  distributed  programming 
environment.  We  are  currently  extending  the  capabilities  of  ADViCE  to  provide  efficient 
adaptive  scheduling  algorithms  and  proactive  management  services. 

We  are  currently  investigating  efficient  techniques  to  achieve  proactive  control  and 
management  of  all  services  offered  by  ADViCE  that  will  include  transparent  performance, 
fault-tolerance,  and  security  services  for  ADViCE  applications/users. 
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